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Natalia Filatkina, Sóren Stumpf & Christian Pfeiffer 
Introduction: Formulaic Language and New 
Data 


1 Preliminary Remarks: What Do We Know? 


The existence of formulaic patterns in the widest sense (including phrasemes, 
constructions, non-literal units and/or other prefabs) has been hypothetically at- 
tested to all the languages in the world. Probably the most extensive attempt to 
grasp the complex nature of such utterances was undertaken within the frame- 
work of phraseology. The complexity was reflected already in the defining criteria 
of phrasemes. According to Burger (2015: 14-15), phrasemes are polylexical items 
that must consist of at least two constituents, have a more or less stable form in 
which they are frequently reproduced by speakers and can (but don't have to) be 
idiomatic in meaning. Research was traditionally focused mainly on one type of 
polylexical word combination, namely idioms such as fo spill the beans or to 
break the ice, because they met all the criteria mentioned above and were there- 
fore considered to be at the centre of the phraseological system. 

But as newer linguistic theories such as usage-based approaches to Construc- 
tion Grammar (Fillmore 1988; Goldberg 1995), corpus linguistics (Steyer 2013) or 
text and discourse studies (cf. most recently Stumpf and Filatkina 2018) show, 
the formulaic character of human communication reaches far beyond the items 
that can meet the criteria of phrasemes (Stein and Stumpf 2019). It encompasses 
single word conventionalised structures such as routine formulae like and?, con- 
gratulations!, frankly (speaking), adverbial/prepositional constructions (notwith- 
standing), word formation, syntax on the one hand and formulaic text genres 
such as contracts, business correspondence, newsletters, recipes, announce- 
ments etc. on the other. Language acquisition (Tomasello 2003) and language 
loss (Wray 2008, 2012) are strongly interwoven with formulaic patterns. In second 
language teaching, too, formulaic items are now considered a key aspect of lan- 
guage competence (Lewis 1993). This new understanding of the constitutive role 
of formulaic patterns is the first central starting point of the current volume. 

The second point concerns the notion of “new data". At first sight, the appeal 
for inclusion of “new empirical data" might seem to be not so new for modern 
linguistic research. It has been in demand since the development of corpus and 
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computer linguistics in the 1960s. However, the appeal was restricted to the anal- 
ysis of large text corpora that until now continue to consist (not exclusively but 
predominantly) of written data from standard languages. Even within the frame- 
work of the above-mentioned newer paradigms, systematic research has been fo- 
cused on only a few European standard languages with a rich literary tradition 
and a predominantly written norm. It was on the basis of these data that the the- 
oretical framework, classification criteria and methodological approaches of var- 
ious research directions dealing with formulaic language were developed. The 
most recent proof of this statement can be found in Hacki Buhofer’s introduction 
to volume 9 of the “Yearbook of Phraseology”: 


Linguistic research has dealt with the semantics of lexics in general and of phraseology in 
specific time and again, and rightly so. The present volume offers the desired spectrum as 
far as the languages examined are concerned, by presenting articles on Russian, English 
and others. At the same time, studies on rare and small languages and languages in the 
process of getting extinct remain a continuing desideratum. While quite a number of studies 
have investigated such languages from a general point of view, only few have taken on a 
phraseological perspective. 

(Häcki Buhofer 2018: 1) 


The current volume does not neglect the necessity and importance of corpus 
based approaches but it goes far beyond that and suggests a shift of focus by pla- 
cing other new data at the center of scholarly research. Within the framework of 
this volume, the “new data” are understood as data from 1) areally limited and 
lesser-used languages, 2) languages spoken outside Europe, 3) linguistic varieties 
used in spoken domains and/or regarded as ‘conceptually oral’ and 4) data from 
the earlier/historical stages of language development. As first studies show, the 
systematic inclusion of these data challenges the existing postulates of research 
on formulaic patterns at both theoretical and methodological levels in a different 
way from the challenges that corpus-driven and corpus-based approaches 

brought decades ago. What we already know now is that, at the theoretical level, 
the challenges affect primarily the role of linguistic genetic affiliation, intertex- 
tuality, variation/modification, normatisation/codification, regularity/analogy 
and frequency in the process of formulaic language formation. In what follows, 
we give a short outline of available scholarly knowledge for each of these phe- 
nomena. 
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1.1 Genetic Affiliation and Intertextuality 


The most extensive attempt to include “new data" into the research on formulaic 
language was Elisabeth Piirainen’s project “Widespread Idioms in Europe and 
Beyond (WI)" (Piirainen 2012, 2016). It was dedicated to the classification of cul- 
tural phenomena in idioms of modern language varieties and had access to 78 
modern standard and lesser-used languages from all language families as well as 
dialects. The project identified 470 idioms as similar and widely known. Current- 
ly, a similarly large-scale project devoted to dialects, spoken data and/or histori- 
cal languages of the mediaeval and early modern periods does not exist. 

Two results of the WI-project are of particular importance. Firstly, earlier 
ideas that the same genetic affiliation of two or more languages could explain a 
similarity on the level of idioms have been disproven. These ideas disregard the 
fact that the origin of the majority of idioms does not go back to a common “proto- 
language" of an early past. As becomes obvious, distribution crosses genetic 
boundaries. Secondly, the concept of a *common (European) cultural heritage", 
which was also often used for explanation of similarities in earlier works, requires 
a more detailed investigation. Until now, cultural traditions from Classical Anti- 
quity, Christianity (the Bible), the Renaissance, Humanism, and the Enlighten- 
ment have been included in this term. Though the role of these domains remains 
central, other cultural domains such as folk narratives, jokes and legends appear 
to be significant as well, particularly for formulaic patterns in dialects, areally 
restricted, lesser-used and/or predominantly orally used languages. These do- 
mains have produced numerous widespread idioms (to fight like cat and dog, to 
shed crocodile tears) and have not yet been listed under the concept of “common 
(European) cultural heritage". Today's convergence of idioms is the product of 
an intense exchange of thoughts and ideas among educated language users that 
could only have been based on writing and reading books in historical times. This 
shared knowledge of widely disseminated written and oral texts led to and sup- 
ported the establishment of cultural memory and many formulaic patterns such 
as idioms and proverbs. The WI-project described this phenomenon using the 
term intertextuality and called for its precise validation in individual languages, 
particularly those outside of Europe, as well as dialects and lesser-used lan- 
guages (Dobrovol'skij and Piirainen 2005; Piirainen 2012: 520). 


1.2 Variation and Modification 


One of the major achievements of phraseological research in recent years is the 
understanding that even highly idiomatic units, such as to cast pearls before 
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swine, are not as fixed as has previously been thought. As the first results of dia- 
chronic studies show, at the historical stages of a language, fixedness or stability 
can only be attributed to a basic structure underlying a formulaic pattern. The 
patterns that might be considered formulaic in a certain language at the current 
point in time are always products of a complex process of change, which is inher- 
ently enabled by variation. However, at the current state of international re- 
search, for the majority of languages, systematic studies into the diachronic pro- 
cesses of the emergence of what is considered formulaic in modern languages 
face methodological difficulties, a theoretical vacuum and most importantly a 
lack of empirical data (Filatkina 2012, 2013, 2018a, b, c). Since its establishment 
in the 19th century, historical linguistics was strongly focused on the description 
of various but single and isolated linguistic domains such as phonetics, grammar 
or the lexicon. The variation and change of formulaic patterns as one basic con- 
dition of human communication remain a fundamental research question for all 
languages without exception and are often completely neglected, even in publi- 
cations claiming the status of reference works on language change (for a detailed 
overview cf. Filatkina 2018c: 57-96). 

As shown in Filatkina (2013) and Piirainen (2000), formulaic patterns under- 
go diachronic changes at absolutely all levels: structure, semantics, pragmatics, 
ways of syntactic contextualization, distribution in texts, stylistic connotations, 
frequency of use, degree of familiarity, cultural image component and so on. 
However, the assumption that formulaic patterns emerge due to a decline in var- 
iation should be reconsidered. Though the pivotal role of the decline in variation 
has been most clearly demonstrated for orthographical (Kohrt 1998), phonetic 
(Kohrt 1998) and morphological (Werner 1998) norms, it does not appear to be 
relevant to formulaic patterns. On the contrary, variation can be an indication of 
the completion of a conventionalisation process and the establishment of a new 
utterance: Only after a pattern has reached a high degree of fixedness and con- 
ventionalisation, can it become subject to variation and/or modification by lan- 
guage speakers and still remain recognisable and understandable for them (cf. 
Burger 2012 for collocations in German). 

Similar research results come from the first works on varieties, dialects and 
colloquial languages handed down orally, including that of Luxembourg, which 
is distinguished by its dialectal origin and the domain of orality. Piirainen (in this 
volume) sums up the findings very precisely: 


[They] showed deviations from the hitherto established theories, e.g. regarding the stability 
or variability of idioms, the so-called anthropocentrism, usage restrictions of idioms 
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(among them gender restrictions which are due to certain images), as well as specific prag- 
matic functions of conventional word plays, all of which up to that time had not been 
known to this extent. 


Synchronic mechanisms of variation and/or modification have been studied in 
detail within the framework of phraseology, particularly using data from stan- 
dard English(es), German, Russian, French, Italian and Spanish.' Despite the nu- 
merous studies, no theoretically viable distinction between variation and modi- 
fication has been reached so far (Pfeiffer 2016, 2017, 2018, Pfeiffer and Schiegg, 
in this volume). The former is generally understood as a conventional and regular 
phenomenon that is independent of particular contexts and compatible with the 
norms of usage of a given language. The different variants are usually not only 
expected to occur with a certain frequency, but also to be stored in the mental 
lexicon and should thus be codified in dictionaries. By contrast, modification is 
defined as an intentional and conscious intervention by a speaker into a common 
form and/or meaning of a formula. Modifications represent an occasional phe- 
nomenon that occurs in a specific context. Thus, they allow for unexpected se- 
mantic-pragmatic effects on the part of the hearer and are used creatively as a 
favourable tool of wordplay, e.g. in mass media headlines, fiction or commer- 
cials. The functions and mechanisms of modifications have been described in de- 
tail for a relatively small number of written standard languages. Once again, how- 
ever, lesser-used languages, oral communication and dialects (Piirainen 1995, 
2007, 2008) continue to be underrepresented in this research area. The same 
holds for historical stages of modern languages (cf. however Pfeiffer and Schiegg, 
in this volume, for 19th century German lower class letter writing). 


1.3 Normatisation and Codification 


The decline of variation in the process of emerging phonetic, morphological and 
orthographic conventions in language use has often been attributed to the nor- 
mative influence of dictionaries and grammar books. This is where the decline 
predominantly took place as the lack of variation was treated as a necessary char- 
acteristic of language norms in historical times. With regard to formulaic pat- 
terns, this does not hold true as dictionaries, historical collections of proverbs 
and idioms as well as chapters dedicated to formulaic patterns in early grammar 


1 For reasons of space, only a small selection of scholarly work can be given here: Burger (2015); 
Dobrovol'skij (2013); Dobrovol'skij and Piirainen (2009); Langlotz (2006); Pfeiffer (2018); Sabban 
(1998). 
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treatises have been compiled with rather different goals from that of a prescrip- 
tive establishment of norms (Filatkina 2016; Hundt 2000; Moulin 2016). There- 
fore, older texts and collections differ substantially with regard to the formulaic 
patterns they include (cf. Filatkina 2018c: 97-127 and 128-141 for Old High Ger- 
man). The same holds true for dialectal, areally restricted data and phrasemes in 
lesser-used languages where dictionaries and grammar books might not exist at 
all (Piirainen 2007, 2008). 


1.4 Regularity and Analogy 


In the same way, the explanation of the development of formulaic patterns and 
their variation just as a case of regularity and analogy would be a simplification 
of the actual state of affairs. Norm conflicts and preservation of lexical and/or 
grammatical constituents that have to be regarded as obsolete or irregular from 
the point of view of free language use are widespread phenomena in the forma- 
tion of formulaic patterns. A corpus based attempt to prove the high degree of 
“regular irregularity” (in terms of norm conflicts and/or preservation of obsolete 
lexical/grammatical constituents) in the emergence of formulaic patterns is un- 
dertaken in Stumpf (2015, 2018, 2019) and based on data from standard modern 
German. 

Within the framework of Construction Grammar, variation, regularity and 
analogy are considered intrinsic features of constructions (Goldberg 2003: 221- 
222). Variation is governed by the principles of inheritance, analogy and family 
resemblance, meaning semantic or phonological similarity between new and ex- 
isting forms, relational knowledge and structural alignment. The conflict be- 
tween these principles should allow for creativity, especially in predominantly 
oral communication, but this point has not yet been made clear. Bybee (2010: 58) 
uses the above mentioned principles for a fine-grained analysis of the variation 
potential of the construction it drives me Xa, but does not discuss a novel utter- 
ance like it drives me happy as a possible creative modification (a construct?) in 
certain contexts. In her eyes, itis just unlikely because - due to analogy and fam- 
ily resemblance principle — the drives-construction goes with adjectives and 
phrases indicating madness or insanity. Research into the micro-steps of varia- 
tion and particularly the role of regularity, analogy and creative modifications? in 
new sources as defined in the current volume still requires a lot of attention in 


2 From the constructionist point of view, the role of creative modifications is studied in Stumpf 
(2016) using data from modern German. 
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order to satisfy the far-reaching claim of Construction Grammar “to account for 
the full range of facts about language, without assuming that a particular subset 
of the data is part of a privileged ‘core’” (Goldberg 2003: 219). 


1.5 Frequency 


Theories of language variation and change (morphological, typological, lexical 
and semantic) stress the pivotal role of frequency in any process of emergence of 
new items. It is a well known fact that in the process of lexicon expansion, for 
example, a sporadic innovation only has a chance of entering the lexicon if it is 
supported by a sufficient number of speakers, i.e. if they frequently use the item 
in anew form and/or meaning and function. It goes without saying that the emer- 
gence of formulaic patterns involves frequency. But another fact has to be taken 
into account as well: Formulaic patterns are constitutive elements of human com- 
munication only with regard to their type frequency; by contrast, their token fre- 
quency is generally low. In other words: A certain degree of formulaicity can be 
attested to absolutely any written text or oral communicative act because any of 
these sources contain different types of formulaic patterns (type frequency). The 
problem is that each type might occur only once (token frequency). 

What seems to be a crucial factor for the emergence of formulaic patterns is 
not so much just the frequent use of a pattern but its frequent use in a specific 
situation of communication - oral or written! — as well as in a specific (cultural) 
text/discourse tradition (Stumpf and Kreuz 2016; Stumpf and Filatkina 2018). The 
link between a formulaic pattern and a context ensures that speakers resort to 
appropriate (even the most irregular!) units in respective situations. Evidence for 
such links has been already provided from different research perspectives and 
various modern languages (cf. Feilke 1994: 226 for German; Koch 1997 for French; 
Wray 2009: 36 and Wray and Perkins 2000: 7 for English), recently also within 
the fine-grained concept of construction discourse and the notion of discourse 
patterns in Óstman (2005, 2015). The *new sources", particularly in the sense of 
spoken data, areally restricted or lesser-used languages, seem to support this ev- 
idence even more strongly. Therefore, more research needs to be forthcoming 
here. 
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2 Where Do We Go from Here? - This Volume 


Departing from these briefly sketched already available research findings, the 

current volume tackles the following questions: 

— What is formulaic in the “new types" of languages, varieties and dialects? 

— Arethe criteria developed within the framework of traditional phraseological 
research (e.g. fixedness, idiomaticity) applicable to “new data”? 

— Can any specific types of formulaic patterns and/or any specific features (se- 
mantic, structural, pragmatic etc.) of regular (already known) types of formu- 
laic patterns be observed and how do they emerge? 

— What methodological difficulties need to be overcome when dealing with 
“new data”? 


2.1 Lesser-Used and Areally Limited Languages 


The first part of the volume brings together studies based on the data from areally 
limited and lesser-used languages. Elisabeth Piirainen’s contribution provides 
the framework for this section and aims to bring together phraseological research 
and studies on formulaic and figurative language of lesser-used, mainly unwrit- 
ten languages, from anthropology and ethnology. The term lesser-used languages 
is applied generically “for smaller and minority languages, which show a down- 
ward trend of influence” (Piirainen, in this volume) and which do not fulfill the 
criteria for their intergenerational transmission. In the context of the article, the 
term covers non-Western minority languages of the Austronesian language 
groups (Kilivila and Kewa), Basque as an isolate spoken in several varieties on 
both sides of the Western Pyrenees, Flathead Salish, a critically endangered 
American Indian variety in Montana, USA, and Inari Saami, a declining minority 
language on the edge of Northern Europe; some examples are taken from ethnic 
African languages. The study investigates body-part semiotizations, conceptual 
metaphors and pragmatic functions of figurative units in such languages. The re- 
sults are threefold: Firstly, the inclusion of new, previously unresearched lan- 
guages clearly shows that the symbolic value of semiotized body parts and inner 
organs is significantly different from that known in Western written languages. 
Secondly, the postulate of universality of such conceptual metaphors as TIME IS 
MONEY or UNDERSTANDING IS SEEING cannot be sustained. Thirdly, the entire com- 
plex of figurative secret languages, “veiled languages” and “tabooed languages” 
in Papua New Guinea appears to have no equivalent in the Western world. 


Introduction: Formulaic Language and New Data — 9 


Areal variation and change in oral modern German is studied by Stephan Els- 
paf. The novelty of this topic is remarkable as, at least to our knowledge, there is 
a complete lack of studies on phraseological change both in contemporary Ger- 
man and in any other modern language. Elspaß refers to data that has been ob- 
tained in three recent research projects on German areal linguistics: Atlas zur 
deutschen Alltagssprache (Ad) ‘Atlas of colloquial German vernacular’ (with in- 
ternet survey data from mostly spoken regional vernaculars), Variantenworter- 
buch des Deutschen (VWB) ‘Dictionary of lexical variation in German’ and the 
Variantengrammatik des Standarddeutschen (VG) *Regional variation in the gram- 
mar of Standard German’ (with data from large regionally-balanced corpora of 
the written Standard German in Germany, Austria and Switzerland). These new 
data are compared with data from the Wortatlas der deutschen Umgangssprachen 
(WDU) ‘Word Atlas of colloquial German’, collected in the 1970s and 1980s, and 
with the findings in Piirainen (2009). The study reveals a number of develop- 
ments in the areal distribution of phrasemes both on the level of colloquial 
speech and in standard written German which have occurred in recent decades. 
In addition to his findings on phraseological change, Elspaß also shows a) that 
there are significant differences between awareness and actual usage of phrase- 
ological units and b) that the representation of areal phraseological variation in 
dictionaries is often misleading or even incorrect. This applies particularly to the 
phraseological dictionary edited by Duden (Duden 11), while the situation is con- 
siderably better for the VWB. 

Basque collocations formed by onomatopoeia and verbs in a corpus of trans- 
lated literary texts are the subject of investigation in Zurine Sanz-Villar’s contri- 
bution. The Basque language has only a weak tradition of written literature and 
its standard variety only a short history. As Sanz-Villar notes, there has been no 
systematic research in the field of Basque phraseology and even less attention 
has been paid to the study of the translation of phraseological units from/into 
Basque. The benefit of the inclusion of Basque into research on formulaic lan- 
guage already becomes apparent at the typological level: Even though Basque 
phraseology still remains underinvestigated, previous research has already iden- 
tified collocations formed by a partially or totally reduplicated onomatopoeia and 
a verb as a special type of formulaic pattern in Basque. Sanz-Villar selected 66 
types and 162 tokens semi-automatically from her corpus and queried them in the 
TraceAligner program for the subsequent translation analysis. The translation 
analysis in its turn has shown that, despite the predominance of the translation 
option when the counterpart of the Basque collocation is a single verb in the Ger- 
man source text, the nuances hidden behind it are of great significance from a 
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translation point of view; indirect translations are not an exception but rather a 
widespread reality in German-into-Basque translations. 


2.2 Languages Spoken outside Europe 


Three contributions in Part II of the current volume offer insights on formulaic 
language from the perspective of three languages spoken outside Europe: Korean 
(Buerki), Classical Arabic (Eisa) and spoken Jordanian Arabic (Badarneh). 

Andreas Buerki tackles the questions how formulaicity may be understood 
across typologically different languages and whether indeed there is a concept of 
formulaic language that applies across languages. Using a new data set consist- 
ing of topically matched corpora in three typologically different languages (Ko- 
rean, German and English) and a constructionist view of linguistic signs, this 
study proposes a quantitatively founded statement that formulaic language has 
to be regarded as a language-specific phenomenon. The conclusion results from 
the observation that though formulaic patterns are in evidence in a very large 
number of languages, their density of occurrence varies greatly between lan- 
guages of different types. A cross-linguistically viable concept of formulaic lan- 
guage cannot be centred at any particular structural level (such as sequences of 
words, phrases or polylexicality) and has to incorporate more abstract elements 
specified at varying levels of schematicity. Buerki's broader view on formulaic 
language coincides with the perspective of the current volume regarding the 
place of formulaic patterns in overall theories of language: Such utterances can- 
not be ignored as insignificant grammatical exceptions or treated marginally as 
only random stylistic/aesthetic phenomena; rather, they should be recognised as 
equally prominent linguistic means of communication in an integrative model of 
language. 

The contribution of Abdullah Eisa is based on similar ideas and demonstrates 
the difficulties that emerge when typological criteria of formulaic patterns estab- 
lished on the data of standard languages (English) are to be applied to Arabic 
phraseology. The criteria in question are the notion of word, polylexicality, flexi- 
bility, frequency, adjacency and idiomaticity/semantic unity. Even though these 
criteria have been described as problematic also in the framework of traditional 
research on the phraseology of standard written languages, Eisa's study makes it 
clear that “new data" shed light on more general issues and can illuminate what 
is required for a complete account of linguistic variety and complexity. 

The third study in this part of the book explores the use of politeness formu- 
laic expressions in everyday social interaction in colloquial Jordanian Arabic. 
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What makes the contribution of Muhammad A. Bardaneh interesting for the cur- 
rent volume is not only the novelty of the data set but also the results of the anal- 
ysis. On the one hand, the studied formulaic expressions pose no theoretical 
problems for their description within the well-known concept of positive and neg- 
ative politeness. Positive formulae in Jordanian Arabic are used in interactional 
and transactional contexts and emphasize solidarity and communal belonging in 
the same way as in other languages studied with regard to this; negative polite- 
ness formulae are concerned with showing deference and non-imposition. Fur- 
thermore, the study supports the notion that formulaic expressions are central 
elements of polite communication in colloquial Arabic in Jordan in a similar way 
to those in any other language. On the other hand, they are different with regard 
to the cultural and social traditions in which they are strongly embedded: Ac- 
cording to Bardaneh, many of these formulae involve reference to God and em- 
phasize the religious and fatalistic nature of the community they are used in. As 
the majority of formulaic patterns are oriented toward positive rather than nega- 
tive face, Bardaneh concludes by emphasizing the positive politeness leanings of 
Jordanians and their concern with solidarity and acquaintance, collectivist satis- 
faction, and communal belonging, as opposed to individualism and personal 
space. 


2.3 Linguistic Varieties Used in Spoken Domains and/or 
Regarded as ‘Conceptually Oral’ 


Spoken data are at the center of the contributions in the third part of the current 
volume and demonstrate that formulaic patterns are dynamic linguistic utter- 
ances that emerge not only in language history but also in most recent times as a 
reaction to social, political, historical and cultural changes. 

Joanna Szerszunowicz draws upon the notion of the so called new pragmatic 
idioms or pragmatemes in Polish and suggests an integrated approach to their 
study. The integrated approach means that the analysis is not restricted to lin- 
guistic aspects of pragmatic formulaic patterns, but also takes into consideration 
other factors, for instance, their cultural background and the cultural-historical 
context in which they emerge. Szerszunowicz’s specific interest focuses on pat- 
terns that came into existence after 1989, the year of Poland’s political and eco- 
nomic transformation. The analyzed idioms confirm the increasing influence of 
the English-speaking world on the Polish communicative style and changing lan- 
guage behaviors in the new reality, in which the quality of being friendly and nice 
gains a new dimension. Other examples can be traced back to the problems of 
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budding Polish democracy or illustrate recent changes in social perception of the 
weekend. 

In contrast, Mareike Keller uses recordings of German-English informal con- 
versation not to study the emergence of new expressions but rather to address the 
issue of the storage and processing of phrasemes. Though this issue has been dis- 
cussed extensively in the previous research, a consensus with regard to the de- 
gree to which phrasemes are stored and processed holistically or compositionally 
has not been reached so far. Spoken bilingual data appear to be particularly fruit- 
ful for the continuation of this dialogue on account of the large number of code- 
switching utterances that shed new light on both syntactic and semantic levels of 
patterns. As Keller states, they provide empirical evidence for the unitary storage 
of phrasemes at the conceptual level as well as for their compositional assembly 
in accordance with structural code-switching constraints during language pro- 
duction. 


2.4 Earlier/Historical Stages of Language Development 


The last part of the volume draws attention to data from historical stages of lan- 
guage development. Marie-Luis Merten's paper examines Middle Low German le- 
gal writing in the Late Middle Ages and the Early Modern Period (1227 until 1567) 
from a diachronic perspective. Despite a vast amount of research, Middle Low 
German can be still considered an underinvestigated historical language, espe- 
cially from the point of view of its formulaicity. What is particularly remarkable 
about Merten's paper is her attempt to investigate historical data within the 
framework of construction grammar, a theory which has traditionally been dom- 
inated by synchronic approaches. Merten interprets evolving and changing con- 
structions of legal writing in connection with the changing communal constructi- 
con, i.e. a socio-cognitive network, a repertoire of constructions shared by legal 
writers of that time. For the analysis of diachronic formulaic patterns it is crucial 
to develop a theoretical framework that is capable of coping with phenomena of 
language in transition and both formulaic (lexical) expressions and more com- 
plex form-meaning pairs between fixedness and variability. This point was al- 
ready made strongly at the beginning of the introduction to this volume and is 
emphasized by Merten. Approaches as shown by Merten can in their turn contrib- 
ute significantly to the development of Construction Grammar as they include the 
cultural and historical context in the analysis of formulaic patterns, a perspective 
which is just starting to find its way into Construction Grammar. 

Christian Pfeiffer and Markus Schiegg conclude the volume with a fine-grai- 
ned study of sources that can be regarded as formulaic in a different sense from 
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legal writings. They examine the use and functions of religious formulae in his- 
torical lower-class letters — a data set taken from the Corpus of Patient Documents 
(CoPaDocs), a new corpus of 19th- and early 20th-century texts written by pa- 
tients in German psychiatric hospitals which has not yet been systematically in- 
vestigated from the perspective of formulaic language. A factor that is of great 
importance for the current volume (and historical linguistics in general) is the 
fact that most of the letters were written by lower-class people with only a poor 
education. Hence, the letters permit an insight into the use of formulaic language 
by ordinary people in the 19th century opening up a wonderful perspective such 
as presents itself only rarely to scholars dealing with earlier periods in the history. 
The authors choose a functional approach and present an extensive analysis of 
the pragmatic functions of religious formulae in these texts. However, they also 
contribute to the above-mentioned challenge of differentiating between in- 
stances of variation and modification. A valuable contribution to the volume is 
the authors' conclusion that the tendency to use formulaic items creatively has a 
long tradition and is not a development of recent decades. The modifications they 
found do not seem to have the aim of wordplay but are most obviously produced 
to achieve particular communicative goals. Based on an exemplary intertextual 
analysis, the authors finally raise the question whether there exists something 
like a European tradition of letter writing and a common stock of formulaic items 
and call for further contrastive research on historical letter writing. 


The contributions to this volume take, each in their different way, upon the sci- 
entific ideas of our colleague Elisabeth Piirainen. In the hope that Elisabeth's 
work will be continued we dedicate this volume to her. 
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Part |: Lesser-Used and Areally Limited 
Languages 


Elisabeth Piirainen 

Lesser-Used Languages and their 
Contribution to the Study of Formulaic and 
Figurative Language 


The Judeo-Christian tradition sees the profusion of tongues after the Tower of Babel as a 
negative outcome punishing humans for their presumption, and standing in the way of co- 
operation and progress. But the Warramurrungunji myth reflects a point of view much more 
common in small speech communities: that having many languages is a good thing because 
it shows where each person belongs. 

(Evans 2010: 6; italics by E.P.) 


Abstract: So far phraseology research has been carried out for a few major literary 
languages. In addition, there is a remarkable number of studies on formulaic and 
figurative language of lesser-used, mainly unwritten languages, anthropology 
and ethnology. Both spheres of research have, for the most part, the same object- 
linguistic data, but have so far little knowledge of each other. This paper is a 
sketchy report on some underestimated studies that can enhance our knowledge 
of phraseology and formulaicity. Three areas have been chosen which have often 
been the subject of research: body-part semiotizations, conceptual metaphors 
and pragmatic functions of figurative units. The inclusion of non-Western minor- 
ity languages reveals various previously unknown peculiarities. The article aims 
to encourage scientists to include new data in phraseology and formulaic lan- 
guage research, i.e. to study languages which have not yet been investigated with 
regard to their phraseology and formulaicity, including varieties which exist 
merely in oral form. 


1 Preliminary Remarks 


Research on phraseology, with a focus on formulaic and figurative language, ex- 
tends for the most part to a few current European standard languages, which all 


Note: The article is published in the original version as written by the author. On account of her 
sudden and unexpected death, the editors of the volume have refrained from including the com- 
ments of two anonymous reviewers. 
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together account for less than one percent of the world’s languages.' The data 
underlying European phraseology research and its results are quite coherent. 
These data are derived from some well-researched literary languages, above all 
from the Western cultural area. All these languages essentially fulfill the same 
communicative functions: they have developed for transregionally valid written 
and oral communication purposes of a complex modern society. The high degree 
of figurativeness of these standard European languages, as manifested in idioms 
and other figurative lexical units, appears equally uniform. It is not just the origin 
of many idioms from the so-called *common European cultural heritage", but the 
same metaphors and symbols, conceptualizations of abstract content that consti- 
tute these consistencies.’ 

Lesser-used languages have so far hardly been included in the phraseology 
research spectrum.’ Two languages on the fringe of the European continent, na- 
mely Inari Saami and Basque constitute an exception. Both languages have been 
thoroughly investigated with regard to their figurative language, and these stud- 
ies reveal metaphors and conceptualizations limited to their linguistic area, un- 
paralleled by other European languages. These studies show very clearly how im- 
portant it is to incorporate new language varieties: new data is urgently needed 
for phraseology research, especially from languages that are predominantly used 
in oral form. The subject of this article is lesser-used languages from the point of 
view of whether they could contribute to our knowledge of phraseology and figu- 
rative language. 

But first let us look at the term lesser-used language. It will be used as a ge- 
neric term for smaller and minority languages, which show a downward trend of 
influence. The term cannot be defined by linguistic criteria; rather, there exist 
extra-linguistic, political, social and economic factors which constitute this term. 
Here we refer to the explanations of relevant standard works, which instead of 
one definition give a bundle of criteria: When one of these criteria is met, it is a 


1 They belong to the Indo-European languages, the Finno-Ugric languages (Hungarian, Fin- 
nish, Estonian), some Turkic languages and Georgian, a South Caucasian language. 

2 Only on a concrete level of the source domains, such as national literature and history or nat- 
ural environment and material culture, can some idiosyncrasies of idioms of these standard lan- 
guages be observed. 

3 And vice versa: studies on lesser-used languages usually excluded phraseology. The 567-page 
volume The Cambridge Handbook of Endangered Languages (Austin and Sallabank 2011), for ex- 
ample, dedicates just ten lines to the topic “Idioms and proverbs”, including commonplaces: 
“One might also wish to include idioms and proverbs because they reflect the culture of a speech 
community more than any other kind of linguistic unit; however, the explanation of their mean- 
ing and use can be difficult” (Austin and Sallabank 2011: 349). 
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question of a lesser-used language.^ The most important condition is that the in- 
tergenerational transmission of the language is not guaranteed. Further criteria 
are, among others: the language is restricted to a small area with few speakers; 
the number of speakers is obviously diminishing; the domains of usage are lim- 
ited to unofficial situations; there is direct competition with the prestigious (na- 
tional) language; standardization and written tradition are missing; the language 
exists mainly in oral form. 

On the one hand, there would be no lack of research topics, as only less than 
one percent of the world's languages have been studied in terms of their phrase- 
ology and figurativeness so far — not to mention all modes of spoken language as 
they appear in the regional colloquial varieties and dialects around the world. On 
the other hand, there is extensive research on the stereotyped nature and figura- 
tiveness of non-literary languages, the data and results of which have hardly 
been noticed by traditional European phraseology research. A number of publi- 
cations can be found on proverbs and other figurative or formulaic units in lan- 
guages outside Europe, including lesser-used and indigenous unwritten lan- 
guages. 

For a hundred years, anthropologists and ethnologists have gathered data on 
the figurative and formulaic language of distant cultures on far-away continents, 
which were partly taken note of by (ethnology-oriented) paremiology, but not by 
linguistic phraseology research. When it comes to incorporating “new data" into 
research, it would be the first step to consider these studies. For in the area of 
stereotyped and figurative expressions, proverbs and metaphors, these studies 
are not only concerned with the study of phraseology, but also have the same 
linguistic elements as the object of their research. The same applies to the lesser- 
used minority languages in Europe, as well as to dialects that have only recently 
been added to phraseology research: they too have not been adequately consid- 
ered, although they can certainly extend the knowledge of phraseology to date. 

This paper is primarily a sketchy report on these underestimated studies. The 
following remarks are grouped around individual results: Section 2 deals with the 
results of the studies of languages distant from standard European varieties, in- 
cluding those of the Austronesian language groups, which have specific concep- 
tualizations of certain internal organs and body parts, as well as concepts in 


4 Cf. Allardt (1984); Fishman (1991); Hale et al. (1992); Matras (2003); Harrison (2007, 2010); 
Krauss (2007); Flores et al. (2010); Austin and Sallabank (2011); Lewis et al. (2014). Also, the def- 
initions of the contrastive terms standard language or literary language must be omitted here and 
the literature be referred to (e.g. Lewis et al. 2014). 
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Basque which have no parallels in the standard European languages. Subse- 
quently, some supposedly “universal” conceptual metaphors are considered. The 
inclusion of new, previously unresearched languages clearly shows that the pos- 
tulate of universality cannot be sustained (section 3). 

Pragmatic functions of proverbs and other figures of speech in several lesser- 
used, partly indigenous languages are the focus of the fourth section. The entire 
complex of figurative secret languages, “veiled languages” and “tabooed lan- 
guages” in Papua New Guinea, for example, has no equivalent in the Western 
world. This topic is related to the classification of phrasemes or formulaic units. 
From the point of view of the languages outside Europe, the categories of the sub- 
ject of investigation can be quite different from that of the European phraseology 
research. Sections 2-4 thus aim to confront the uniformity of the written Euro- 
pean standard languages which have been studied so far with a diversity and dis- 
similarity outside this field, while section 5, Concluding remarks, provides refer- 
ences to current phraseology research. 


2 Semiotized Concepts of Inner Organs and Body 
Parts 


The far-reaching uniformity of European literary languages — as opposed to the 
diversity of lesser-used and non-European languages — can be demonstrated by 
the so-called “body part idioms”.’ From the beginning of phraseological research, 
the human body has been considered an extensive source domain and there has 
been a long tradition of analyzing somatisms (idioms with body part constitu- 
ents). A wealth of publications on European standard languages provides a uni- 
form picture of this group of idioms, especially of the symbolic functions of body 
part concepts such as, for instance, HAND, HEAD, EYE, TONGUE, etc. Internal organs 
are usually counted among the body parts. For example, HEART has been the sub- 
ject of a large amount of phraseological work. These studies also show the strik- 
ingly close similarities in the area of symbolization (HEART is the imaginary organ 
of positive emotions in all standard European languages, without exception ).^ 


5 Compare the examples discussed in Piirainen (2016: 534—610). 

6 Only a few isolated relics may suggest that this was not always the case, cf. the idioms to learn 
something by heart and French réciter par coeur “to recite by heart” which reveal an earlier semio- 
tization of HEART as the seat of mental activities. 
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That these Western languages are so consistent becomes particularly appa- 
rent when we turn our attention to languages of distant cultural areas. For some 
time, several East Asian standard languages have been the subject of phraseolo- 
gical research. They have significantly expanded our knowledge of symbolic 
functions of internal organs in figurative lexical units. As Yu (2003) points out, a 
wealth of Chinese idioms reflects the pre-scientific concept of GALL BLADDER, 
which is deeply anchored in the theory of internal organs in traditional Chinese 
medicine: the gall bladder serves to make judgments and decisions and deter- 
mines the degree of a person’s courage. According to Siahaan (2008), in Indone- 
sian it is the LIVER (hati), among other things, that correlates with concepts such 
as HEART and MIND in European culture.’ A key concept in Japanese culture called 
HARA, in turn, has no equivalent in Western languages. It is used mainly in male 
speech. Translations of hara as ‘(the inner part of the) belly’, ‘abdomen’, etc. are 
only makeshift. HARA is regarded as the location of the mind, the center of mental 
energy and emotions, and emerges in a number of idioms (e.g. hara wo warau “to 
split the belly” ‘to reveal one’s thoughts, to tell the truth’, cf. Hasada 2002, among 
others). 

What these body-based concepts have in common and what distinguishes 
them from those in Western languages is not only the semiotized organs that 
seem to be unusual (GALL BLADDER, LIVER, BELLY).? Rather, these organs are seen as 
the seat of thoughts and/or emotions. Thus, the concepts do not comply with the 
common dichotomy in Western culture, the “Cartesian duality” of HEAD and 
HEART, i.e. the division between intellectual thoughts and emotions. A look at fur- 
ther languages of distant continents, but also at the Basque language, reveals the 
extent to which this Cartesian dualism is confined to the standard languages of 
Europe.’ 


7 LIVER as a semiotized concept is in fact much more widespread; it is documented in many 
languages of the East Asian and Pacific region (see, for example Franklin 2012: 189-190), but 
also known in European languages. There is a long cultural and historical tradition as to why the 
LIVER represents a special place of emotions; cf. Dobrovol’skij and Piirainen (2005: 127-128). 

8 According to Sharifian et al. (2008: 6), this kind of abdominocentrism is characteristic of 
Southern Asian and Polynesian cultures while Chinese favors cardiocentrism (HEART as the seat 
of mind and emotion), and cerebrocentrism (HEAD as the seat of intellect) dominates West Asian, 
European and North African cultures. However, this is not entirely correct (cf. Ibarretxe-Antu- 
fiano 2012: 257; cf. also footnote 6). 

9 Lutz (1987: 308) expressed the same ideas in her research on emotions of Ifaluk people (Mi- 
cronesia): “[T]he dichotomous categories of ‘cognition’ and ‘affect’ are themselves Euroameri- 
can cultural constructions.” 
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2.1 Kilivila 


A good example is Kilivila, an aboriginal Austronesian language spoken in the 
Trobriand Islands (Papua New Guinea), which, from a European perspective, has 
extremely unusual conceptualizations of body parts and internal organs. Cultural 
anthropologist Bronistaw Kasper Malinowski carried out intensive field research 
on the island of Kiriwira, in the years 1915-1919, and managed to collect a large 
number of figurative units, including numerous magical formulae, and made 
them available in translation. 

For the discussion of the somatic idioms, Malinowski’s investigations on bo- 
dy-part terminology of the Trobriand Islanders are of great interest. The linguistic 
data presented in his famous book, Argonauts of the Western Pacific (1922), are 
fundamentally different from what was later written about European somatisms. 
His knowledge on how Trobriand Islanders speak about body and mind as well 
as his ethnophysical theory about the body and its role in emotions, knowledge, 
thoughts, memory, etc., however, received little attention on the part of phrase- 
ology research. In new field research, in the 1980s, Gunter Senft once again in- 
vestigated the peculiarities of the Kilivila language, including the body-part ter- 
minology, and fully confirmed Malinowski’s results.? The following quotation 
gives an insight into the way in which intellectual activities are conceptualized 
either by LARYNX or by BELLY, in the idioms of Kilivila: 


The mind, nanola, by which term intelligence, power of discrimination, capacity for learn- 
ing magical formulae, and all forms of non-manual skill are described, as well as moral 
qualities, resides somewhere in the larynx. [...]. The memory, however, the store of formulae 
and traditions learned by heart, resides deeper, in the belly. A man will be said to have a 
good nanola, when he can acquire many formulae, but though they enter through the lar- 
ynx, naturally, as he learns them, repeating word for word, he has to stow them away in a 
bigger and more commodious receptacle; they sink down right to the bottom of his abdo- 
men. 

(Malinowski 1922: 316) 


The Kilivila speakers have two body-based concepts that are located in different 
body parts and relate to different layers of intellectual activity. One of these con- 
cepts is located somewhere in the larynx - it comes close to what Europeans call 
*mind". In addition, there is a *deeper" concept, which is stored in the belly/ab- 
domen - it is similar to what is understood in the Western world by *memory". 


10 Compare Senft (1986, 1998), especially his “Appendix A: Kilivila body-part terms" (Senft 
1998: 94—96) and “Appendix B: Speaking idiomatically about the body and the mind in Kilivila" 
(Senft 1998: 97—104). 
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This concept manifests itself in a series of idioms, in which words related to ‘lar- 
ynx’, the central organ of voice and speech, are made into meanings like ‘mind’, 
‘intelligence’, ‘idea’ or ‘wants’ (Senft 1998: 94, 100-104). Conversely, words for 
‘speech defect’ stand for ‘stupidity’ (Senft 1998: 100). From a cultural point of 
view, this area of the mental activities is the more important, because it allows 
Trobriand islanders to memorize and fix in the mind the world of magical formu- 
lae that is central to their culture.” The “store of formulae and traditions” and the 
ability “to acquire many formulae” referred to in this passage leads us to the func- 
tion of these idioms in the society and culture of the Trobriand Islanders. 


2.2 Basque 


Another example of unique conceptualizations comes from Basque, an isolate 
spoken in several varieties on both sides of the Western Pyrenees. Basque is the 
only remaining language of the oldest attainable stratum of Southwest European 
languages. Until the last century, it was used predominantly in oral communica- 
tion and its written tradition is relatively young (older writings originated almost 
exclusively in a religious context). Basque has always been in contact with other 
languages and cultures, and indeed has been influenced by the dominant philo- 
sophical and religious movements throughout history. Nevertheless, the lan- 
guage has preserved some outstanding concepts, as outlined in the works of Ib- 
arretxe-Antunano (2008a, 2012). Her studies on external and internal body-part 
related concepts in Basque unveil certain conceptualizations that are deeply en- 
trenched in this language and unparalleled in other European languages. 

A dichotomy between the rational and irrational sides of the body as two sep- 
arate entities is not unknown in Basque (i.e. BURU ‘head’ as the seat of intellect 
and BIHOTZ ‘heart’ as the seat of emotions) but is regarded as affected by global 
influences (Ibarretxe-Antufiano 2012: 266). In contrast, there is the concept GOGO 
which comprises both intellect and emotions and is truly unique to Basque. It is 
described “as a kind of ‘primitive (irrational) thought’, that is, an intellectual rea- 
soning process based on intuition and emotion" (Ibarretxe-Antunano 2008a: 
122). 


11 Malinowski (1922: 317) reports how one of his informants answered the question whether he 
had any more magic formulas to produce. *With pride, he struck his belly several times, and 
answered: ‘Plenty more lies there!’ I at once checked his statement by an independent informant, 
and learned that everybody carries his magic in his abdomen." 
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Examples show the wide scope of meanings of GOGO in Basque figurative lex- 
ical units. On the one hand, intellect and thought can be to the fore, comparable 
to the functions of HEAD in other languages, as in the expressions gogo argi “gogo 
light” ‘bright mind’, gogamen ‘intelligence’, gogoeta ‘thought’ or gogo-an 
izan *gogo-Loc be.PFV” ‘to remember’. On the other hand, emotions and feelings 
can be the focus, similar to functions of HEART, cf. Basque gogoalai “gogo.happy” 
‘jovial, cheerful’, gogo-a berotu “gogo-ABS heat.PFV” ‘to encourage’, gogo-ak izan 
“gogo-ABS.PL have.PFV" ‘to feel like’, gogohandi “gogo.big” ‘magnanimous, gen- 
erous’ and the like (cf. Ibarretxe-Antufiano 2012: 266-267). 


These examples show that gogo harmoniously unites these two apparent contrary concepts 
in one; in a way, gogo is a kind of primitive thought or rational soul, where there is an intel- 
lectual reasoning process, but one based on intuition and emotion; or to put it in another 
way, an intellectual reasoning process prior to any distinctions between feelings and 
thought - which, in fact, implies that reason and feelings are not differentiated at all. 
(Ibarretxe-Antufiano 2012: 267) 


The old pre-Indo-European Basque tongue at the edge of Europe seems to pre- 
serve a unique “pre-Cartesian” worldview, since there are no parallels to the GOGO 
concept in languages other than Basque. However, this is surprising only against 
the background of the European standard languages investigated so far, whose 
images and conceptualizations are very similar to each other. Within the vast 
amount of literature on idioms containing somatic constituents, no attention has 
been paid to the peculiarity of Basque figurative language, as is also the case with 
phenomena demonstrated by the body-part terminology of Kilivila and other Ab- 
original languages. This leads us to the next section. 


3 Conceptual Metaphors and "Universality" 


The so-called *universality" of conceptual metaphors, as supposed by Lakoff's 
*Cognitive Metaphor Theory" (e.g. Lakoff and Johnson 1980; Lakoff 1987), is one 
of to the principles of figurative language, which for a short time were regarded 
as irrefutable. It has been argued that certain particularly typical metaphors can 
be found in all languages and cultures. This assumption has been ascribed to the 
concept of “embodiment”, the idea that body experiences underlie a number of 
conceptual metaphors, and to the sameness of human beings and their same 
physiological mode of operation across different cultures. However, these meta- 
phors, postulated as ubiquitous, if not universal were discussed especially from 
an Anglocentric viewpoint and only on the basis of a small number of languages. 
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This theory, which was innovative at the time, has undoubtedly promoted 
the study of figurative language, but it has also drawn criticism from various 
sides. This is not the place to expand on this. Rather, some works on lesser-used, 
culturally distant languages which clearly disprove the universal character of 
these metaphors should be examined. These studies have been almost com- 
pletely disregarded in traditional research on metaphors and figurative language. 


3.1 TIMEIS NATURE 


A linguistic community which belongs geographically to Europe disproves the 
idea of *universal" metaphors. This language is Inari Saami, a declining minority 
language on the edge of Northern Europe, which does not fit into the system of 
widespread conceptual metaphors but displays completely independent images. 
The works on the phraseology of Inari Saami, a Uralic language spoken by about 
350 people in Northern Finland, are of particular importance to the history of Eu- 
ropean phraseology. It was the Idiom Dictionary of Inari Saami (Idstróm and 
Morottaja 2006) which for the first time documented the figurative language of 
an indigenous population in its originality; it was followed by several other pub- 
lications (Idstróm 2010, 2011, 2012). 

The merit of this pioneering work is, on the one hand, the description of the 
methods used to make the older speakers of Inari Saami remember the authentic 
figurative expressions. On the other hand, results have been achieved for the the- 
ory of phraseology and metaphor research. The study shows that Inari Saami has 
its own conceptual metaphors - also created at a more concrete level — from the 
previous living conditions of an indigenous population which have no parallels 
in all other European languages studied so far. We will restrict ourselves to the 
well-known metaphoric model TIME IS MONEY and its parallels in Inari Saami. 

According to Idstróm (2010), the traditional Inari Saami culture was poly- 
chronic. A metaphor such as TIME IS MONEY would have no place in the figurative 
language of the Saami." Their culture was fundamentally different from that of 
most other societies in Europe. It was based on fishing, hunting and reindeer- 
husbandry in the harsh conditions of Lapland. Until the 1900s, the lifestyle of the 
Inari Saami community was determined not by the calendar but by the course of 
the seasons, the knowledge of nature and animal behavior. The Saami made 
every endeavor to predict the weather and timed their actions according to it. 


12 Mueller (2015) discusses examples of other languages where the TIME IS MONEY metaphor did 
not exist but gained ground due to the “Westernization” of traditional non-Western cultures. 
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There was no preset schedule for determining actions such as fishing and hunt- 
ing. It was important to know as exactly as possible what the weather and the 
snow conditions would be like. 


Time was not as strikingly objectified as in modern post-industrial cultures where the TIME 
IS MONEY conceptual metaphor seems to be prevalent. Traditional Inari Saami time was 
mainly contextual, not a centre of attention, and definitely not something worth money. 
Logically, the linguistic Inari Saami metaphors describe time systematically as nature or as 
something that happens in nature. 

(Idstróm 2010: 174) 


The timing of human action was based on observations in the natural environ- 
ment and spontaneous reactions to these observations. Based on a number of In- 
ari Saami idioms, the author has reconstructed the conceptual metaphor TIME IS 
NATURE. For example, the ‘beginning of autumn’ is denoted by riemnjis kamás-iió- 
is koco *fox is hanging up his legs" (as if the red leaves on the trees were the fox's 
legs or socks hung on the trees by him), and the ‘period of the ongoing winter’ is 
called taan muottuu ääigi “during the time of this snow" (snow referring meto- 
nymically to the time when the snow is on the ground). 

In sum, the conceptual metaphor TIME IS NATURE is appropriate for the tradi- 
tional polychronic Saami society, connected with the lifestyle in the arctic envi- 
ronment. This metaphor has no parallels in the hitherto analyzed standard Euro- 
pean languages. This does not mean that it could not occur in other languages, 
for example, in languages of the Arctic region with the same climatic conditions 
(such as Komi or Tundra Nenets). However, any kind of investigation is lacking. 


3.2 UNDERSTANDING IS HEARING 


Other examples are the conceptual metaphors UNDERSTANDING IS SEEING and KNOW- 
ING IS SEEING, respectively, which are found in the standard European languages 
that have been examined so far, both in idioms” and in individual words, espe- 
cially verbs of seeing (cf. you see? meaning ‘do you understand?’). This metaphor 
has attracted the interest of Lakoff and Johnson (1980) and several of their suc- 
cessors. Perhaps the best known is the study by Sweetser (1990). Although her 


13 Compare, for example, widespread idioms like to bring something to light ‘to make something 
known that others would prefer remained unknown', behind someone's back *without know- 
ledge of another person": light is required for seeing and recognizing things in the environment, 
and what is behind someone's back remains hidden, therefore unknown (Piirainen 2016: 417, 
485). 
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investigation is not concerned with idioms, her analysis of the semantic exten- 
sion of perception verbs is highly regarding the above-mentioned metaphors. 

Sweetser (1990) hypothesizes that vision has primacy, due to the human cog- 
nitive structure and the metaphorical and cultural aspects of this structure. 
Therefore, verbs of higher intellection, such as to know and to think are recruited 
from verbs of seeing. She suggests that verbs meaning 'to hear' would not take on 
these readings - an assumption which in her opinion applies to all languages and 
cultures. According to Sweetser, the meaning-extension from SEEING to KNOWING 
and, accordingly the metaphor UNDERSTANDING/KNOWING IS SEEING are universals. 

The primacy of siGHT has been questioned several times. Clear criticism came 
from Ibarretxe-Antunano already in 1999, when she rejected this idea of *univer- 
sality" - among other things - on the basis of Basque (cf. especially Ibarretxe- 
Antuñano 2008b). Looking at languages of distant cultures is particularly helpful 
in this context. Evans and Wilkins' (2000) study on perception verbs in Australian 
Aboriginal languages must be mentioned here. Based on the material of more 
than 60 languages, these hypothetical universals have been clearly disproved. It 
is exactly the opposite: Australian languages gain verbs of cognition like to think 
and to know from verbs meaning ‘to hear’, and not ‘to see’. It can be inferred that 
the allegedly universal metaphorical concept UNDERSTANDING IS SEEING has no 
place in these languages but is replaced by UNDERSTANDING IS HEARING. Evans and 
Wilkins (2000: 580—586) provide convincing social and cultural reasons for the 
semantic extension of HEARING to KNOWING and THINKING in Australian Aboriginal 
societies, pointing to the role of oracy vs. literacy in privileging hearing in 
nonwriting cultures as opposed to sight in literate cultures. 

Similar results also come from other regions of the world. Among the few 
works on metaphors in remote, non-Western languages that are available so far, 
some other languages have been found to be unfamiliar with the metaphor UN- 
DERSTANDING IS SEEING. Let us look at Flathead Salish, a critically endangered 
American Indian variety in Montana, USA. Among the Salish metaphors collected 
by Sherris et al. (2015) the concept of UNDERSTANDING IS HEARING clearly emerges. 
Sherris et al. (2015: 122) point to the cultural weight given to SEEING Or HEARING as 
a means of understanding and point to the orality of Salish as cause of the differ- 
ence: “[I|n Salish culture, more importance is given to oral traditions, while in 
cultures that primarily speak English, more importance is often given to written 
traditions. However more data points would be required from other languages to 
make this assertion definitively." 

The similarity of English metaphorical schemes with those of many other Eu- 
ropean languages and their absence in distant non-Western languages has long 
been pointed out by anthropological work on metaphors (cf. Keesing 1985). It has 
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recently been confirmed by further research on lesser-used non-European lan- 
guages, be it metaphors for DEATH in Maori (J. King 2015), TASTE metaphors in mi- 
nor Papua New Guinean languages (P. King 2015) or body-based EMOTION meta- 
phors in Safaliba (Schaefer 2015), a declining language spoken in Ghana. 
According to Schaefer (2015: 93) 


it seems prudent not to adhere too strongly to proposed “universals” unless they are sup- 
ported by field research from diverse language families worldwide. Until a great many stud- 
ies of conceptual metaphors are done on lesser-known languages, we will not even know 
what questions to ask about what such “universals” might really look like. 


4 Pragmatic Functions 


What has already been mentioned in sections 2 and 3 also applies to the prag- 
matic functions of the phrasemes: the European standard languages examined 
so far also appear remarkably consistent in this field. Pragmatics has long been a 
popular subject of phraseology. The numerous publications on this topic refer to 
general discussions within the framework of a theory valid for all current lan- 
guages, and not to contrastive approaches that would elaborate interlingual dif- 
ferences. 

The pragmatic functions attributed to phrasemes, especially to idioms, are of 
varied nature. Older traditional phraseology research regarded the "semantic 
surplus value" and the associated "higher expressivity" (Kühn 1985), later also 
the “communicative” or “connotative surplus value" as prominent features of 
phrasemes in contrast to non-phraseological word combinations. Later these 
views were qualified; it was found that a large number of idioms are rather un- 
marked in terms of their pragmatic, stylistic or connotative functions. Other prag- 
matic functions of idioms, well-known in many languages, consist of euphemis- 
tic allusions, often used in a humorous mental detour instead of directly naming 
what is meant (e.g. negative issues). But this is not the point here; we only want 
to note that European phraseology research has revealed no serious differences 
in the pragmatic behavior of phrasemes among the diversity of languages inves- 
tigated so far. 

The results would be different if historical linguistic layers were included. For 
example, didactic and educational or morally warning functions of paroemia and 
adages would play a much larger role (see, for example, Hallik 2007). Quite dif- 
ferent dimensions of pragmatics can be observed if distant minority languages - 
far from the Western cultural area and used predominantly in oral form - are in- 
cluded. Several publications on figurative and formal language of non-European 
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lesser-used varieties provide us with examples. We limit ourselves here to three 
complexes of pragmatic functions, which are distinctly different from the uni- 
formity of the current European languages.“ 


4.1 “Secret Languages” 


The first complex is intertwined with the veiling functions of euphemistic expres- 
sions, as they are known in the European languages as well, but which manifest 
themselves in a very different way. Recently, a few studies on endangered lan- 
guages — mainly of the East Asia-Pacific region - have emerged which demon- 
strate the disguising potential of entire speech systems. What is interesting for 
the study of phraseology is the fact that these “secret languages” consist of idi- 
oms in the traditional sense, but the figurative expressions (used to veil tabooed 
concepts) are more obscure and occur in an abundance that permeates the entire 
language system: only insiders know the code; for outsiders, the language is in- 
comprehensible. 

Karl J. Franklin’s outstanding work on Kewa (e.g. Franklin 1972, 1977, 2003, 
2012) should be mentioned here as a representative of other studies. Franklin re- 
ports on different categories of veiled language such as tabooed speech, intimate 
speech, ritualized speech, coded speech as warning and prohibition, and saa agaa, 
the most salient and prototypical example of “veiled language”: 


Saa agaa occurs in a variety of modes whose codes are interpreted according to the cultural 
communication setting. Although disguised speech is most often spoken, it may be shouted 
as a warning (puri pane agaa), a challenge (yada malue agaa), interpreted from song |...], 
expressed in courting (remani agaa), or even whispered (mumu agaa). The overall purpose 
and role of disguised speech is to leave the hearer with a certain amount of bewilderment, 
so it would defeat its purpose if the communication was completely transparent and not 
subject to interpretations. 

(Franklin 2012: 192) 


14 A unique function of formulas in non-written indigenous languages must be disregarded 
here for reasons of space. It is formulaicity as a prop in the memorization of oral literature. As 
Riesenberg and Fischer (1955: 9) put it for Ponapean (Micronesia), various stereotyped figures of 
speech “seem to be used solely as mnemonic devices for recalling the legends or lore to which 
they are attached [...]" — a function that connects them with literature of classical antiquity: mne- 
motechnical support has also been ascribed to the abundance of formulas in the Homeric poems 
(e.g. Lardinois 2001; Minchin 2001; Sale 2001). 
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An example is the Kewa idiom rigi-areke lapo rata madi-ta aa. It is glossed as 
*bamboo.knife1-2 two both carry-3.sg.prf man” and translated literally as “(be 
careful of) a man carrying both the rigi and arege bamboo knives”. The figurative 
meaning discloses itself only if one knows the code and its components. The id- 
iom is veiled because one knife would be adequate and here the figure of speech 
implies that if two are used the man is showy or pretentious, perhaps not to be 
trusted. The knives are codes for the inferred metaphorical characteristics of cer- 
tain kinds of people (Franklin 2012: 191f.). Understanding such coding is a pre- 
requisite for interpreting the Kewa metaphorical and pragmatic system. 

Noteworthy are also other forms of the highly idiomatic tabooed Kewa lan- 
guage which are used almost exclusively by men, members of a particular group, 
while they are preparing for a cult or carrying out other kinds of joint works. A 
well-researched subcategory is the so-called pandanus language.” It is a secret 
contrived linguistic system which is used by men when they traditionally camp 
in the forest to harvest the pandanus nuts. Many ordinary Kewa expressions must 
be avoided and replaced by codes. “For example, when they took their dogs on 
such trips, the owners gave them a derisive and idiomatic name that was *magi- 
cal’, following a code often found in ‘disguised speech’” (Franklin 2012: 193). This 
is probably due to the idea that spirits in the deep forest of the mountains should 
not be disturbed by profane, non-ritual language. 


4.2 “Authority” 


4.2.1 Terminological Problem 


Other complexes of pragmatic functions we want to look at are grouped around 
the aspect of “authority”, which has been attributed to figurative and formulaic 
expressions. A high appreciation of proverbs - they were regarded as an embod- 
iment of general truth — has been proved for several epochs of linguistic history 
and diverse regions and is still alive in some small speech communities around 
the world. The literature on the figurative aspects of lesser-used languages out- 
side Europe gives many examples. However, a terminological problem must first 
be considered. Most languages do not distinguish between proverbs and related 


15 Cf. Franklin's (1972) detailed study on the Kewa pandanus language. Similar observations 
come from Kalam, a small language spoken in the Papua New Guinea Highlands: there, a special 
““Pandanus Language’ is used in the forest when on expeditions to harvest and eating pandanus 
nuts and when hunting or eating cassowaries" (Pawley 1993: 117). 
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elements of oral folklore. Often the same word applies to proverb, moral story, 
parable, riddle, and the like. In many publications, “wise words” is used as a ge- 
neric term for these kinds of figurative speech. Deputizing for others let us look 
at Keith Basso’s famous book entitled “Wise Words” of the Western Apache. Wise 
words are a “distinctive speech genre associated with adult men and women who 
have gained a reputation for balanced thinking, critical acumen, and extensive 
cultural knowledge" (Basso 1976: 99).’° 

“Wise words” may be used by the native speakers themselves, however, usu- 
ally along with a differentiated terminology of the figurative expressions of their 
languages. Finnegan (1970: 390) gives examples of African languages: 


The Fulani term mallol for instance, means not only a proverb but also allusion in general, 
and is especially used when there is some deep hidden meaning in a proverb different from 
the obvious one. Similarly with the Kamba term ndimo. This does not exactly correspond to 
our term “proverb” but is its nearest equivalent, and really means a “dark saying” or “met- 
aphorical wording", a sort of secret and allusive language. 


We cannot go into every detail of manifestations of “proverbs” in the distant lan- 
guage communities which seem odd to us: proverbs can occur in chants, they can 
be sung, but can also be drummed: a very special category is created by the drum 
proverbs among Bantu ethnicities in Cameroon, which are performed by striking 
sequences on the wooden slotted drum (Piper 1989). All this seems to distinguish 
the concept of *proverb" in these languages clearly from its European counter- 
part. The terminological problem, therefore, must always be kept in mind when 
evaluating the functions of formulaic and figurative units of lesser-used lan- 
guages outside the Western cultural area. What is referred to by “proverb” in 
these publications does not correspond to the definitions of proverb in European 
linguistics." 

These considerations are not trivial: they emphasize that the terms defined 
by European phraseology researchers are not valid worldwide, precisely in view 
of the prevalent orality of the languages considered here. If a term such as proverb 
is discussed or defined, the addition should always be made “this only applies to 
the standard languages that have been examined so far". 


16 Cf. also the discussion on the definition of proverb against the background of African lan- 
guages by Hansford (2003). 

17 This reminds us of Grigory Permyakov’s much discussed theories in his “Grammar of prover- 
bial wisdom" (IIepMakoB 1979) which state: As a sign, the proverbs belong to language and as 
models to folklore (i.e. as cultural models they belong to the literature of folkloric provenance). 
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4.2.2 Authority as Judicial Argumentation 


Let us return to the pragmatic functions grouped around the “authority” of prov- 
erbs and related figures of speech. As mentioned above, the purposes of using 
proverbs and other formulaic units are manifold, ranging from all kinds of rhe- 
torical and didactic intensions to magical functions in performing rituals and cer- 
emonies. The educational function is probably one of the most important.” Let 
us look at a function which, from a European perspective, seems to be uncom- 
mon: the use of proverbs as a part of legal procedure, as a method of gaining favor 
in court, as it is reported from several African ethnicities. There is an early contri- 
bution (Messenger 1959) on the Anang people (Nigeria). In the tribunals and hear- 
ings of the Native Court proverbs were skillfully introduced and influenced the 
actual decisions. From Gillian Hansford's investigations of the language of 
Chumburung people (Ghana) it can be seen that the practice of obtaining ad- 
vantages with the help of proverbs is still common in present-day African speech 
communities (Hansford 2003: 73-75). 

An episode from an Anang court negotiation described by Messenger (1959: 
68-69) may serve as an example. It is about a notorious thief, who was again 
charged with theft. The plaintiff aroused antagonism towards the defendant by 
quoting the proverb If a dog plucks palm fruits from a cluster, he does not fear a 
porcupine. The rhetorical intention was to say that a chronic thief like the accused 
would not be afraid to steal time and again, just like a dog which can deal with 
the sharp needles of the palm fruit: it would be unafraid even of the porcupine's 
prickles. But that is not all, for the accused reacts with another proverb: A single 
partridge flying through the bush leaves no path. 


In using this proverb the accused likened himself to a single bird, without sympathizers to 
lend him support, and called upon the tribunal to disregard the sentiments of those in at- 
tendance and to overlook his past misdemeanors and judge the case as objectively as pos- 
sible. 

(Messenger 1959: 69) 


Both proverbs took a prominent place as authorities for the judicial decision — a 
function which is hardly conceivable in court negotiations in the West. 


18 Taha (2011: 1) reports for Dongolawi Nubian, an endangered language spoken in Sudan, that 
“proverbs, often performed by the elderly, provide guidance, particularly to the young speakers. 
They generally recommend the adaption of the soundest course of action in life (as it pertains to 
the Nubian culture), and they warn of the consequences of bad and/or unreasonable choices." 
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4.2.3 Authority of Proverbial Knowledge 


Let us conclude this section with another aspect of “authority”, i.e. the fact that 
the use of proverbs and idioms can assume a completely different status of a per- 
son from that which would have been assumed from the viewpoint of standard 
European languages that have been researched so far. There is a remarkable ar- 
ticle by the anthropologist Firth (1926) on Maori proverbs. Firth (1926) was one of 
the first to emphasize the need to take into account the context in which a proverb 
is used. His results included the observation of great authority attributed to the 
use of proverbial utterances. He reports that the ancestors of a Maori, particularly 
if men of high rank, were deeply venerated and great stress laid upon their last 
proverbial words on the deathbed: 


[...] they were quoted for years or even generations afterwards. From this, it is evident that 
the opinion expressed in any proverb, especially if known to have been uttered by some 
dead chief of high renown, was a matter of grave import, representing as it did the words 
and authority of the venerated past. 

(Firth 1926: 257-258) 


Another example comes from Bété, one of the endangered languages spoken in 
the Ivory Coast (Zouogbo 2015). Bété figurative units are deeply rooted in the cul- 
ture of Bété people, in their ideas of cosmogony and in oral narrative traditions. 

Various expressions are based on “intertextuality”, as they summarize the 
gist and moral of a well-known narrative. It is established that in for literary lan- 
guages a large number of phrasemes originate from once well-known texts. The 
fact that this also applies to languages without a literary tradition may be surpris- 
ing. However, here it is not written texts, but narrations, reports or legends of 
once important events which are passed down by word of mouth and provide the 
basis for figurative expressions in a series of lesser-used languages. 

The use of figurative expressions is associated with particular social func- 
tions, as Zouogbo (2015) vividly explained. If someone presents a thought with 
the help of an idiom or a proverb, he shows that he is well acquainted with tradi- 
tional culture. This, in turn, gives him recognition, power, and authority. For ex- 
ample, in order to be eligible for appointment as a village elder in a village com- 
munity, the person concerned must prove that he has a full command of figura- 
tive expressions. Thus, transmitting a message by means of an idiom or a proverb 
indicates not only language competence but also knowledge of the traditional 
culture and therefore means authority. 

These functions of idioms and proverbs differ significantly from those known 
in societies with literary traditions. Further studies on languages of preliterate 
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communities are urgently needed to gain an overall picture of the occurrences 
and manifestations of phrasemes and formulas, not just based on a narrowly lim- 
ited selection of languages (cf. the quotation from Nettle and Romain in the fol- 
lowing section). 


5 Concluding Remarks 


In their remarkable book of 2000, Vanishing Voices. The Extinction of the World’s 
Languages, Daniel Nettle and Suzanne Romaine report most vividly the extent to 
which the extinction of languages affects mankind’s cultural knowledge of man- 
kind. At the same time, the authors generally criticize linguistics, which does not 
seem to be interested in this cultural loss, and by no means tackles the investiga- 
tion and documentation of endangered languages. The authors use a suitable 
comparison to illustrate the deficiency of linguistic research on more exotic lan- 
guages: 


Linguists need to study as many different languages as possible if they are to perfect their 
theories of language structure and to train future generations of students in linguistic anal- 
ysis. [...] New and exciting discoveries about language are still being made. There is every 
reason to believe that what we know now is but the tip ofthe iceberg. [...] Satisfying answers 
to many current puzzles about languages and their origins will not emerge until linguists 
have studied many languages. To exclude exotic languages from our study is like expecting 
botanists to study only florist shop roses and greenhouse tomatoes and then tell us what the 
plant world is like. 

(Nettle and Romaine 2000: 11; italics by E.P.) 


Nettle and Romaine do not refer explicitly to phraseology or to studies of figura- 
tive and formulaic language in a broad sense. But how much more would their 
appeal apply to these areas! By analogy with the theories of language structure 
mentioned, the knowledge of the present theory of phraseology is founded on 
only a small fraction of the world's languages, on a very one-sided, limited mate- 
rial basis comparable to the florist shop roses and greenhouse tomatoes in Nettle 
and Romaine's figure of speech. As mentioned at the beginning, languages used 
only orally, minority languages of distant continents, as well as oral versions of 
otherwise well studied European languages, have been almost completely ex- 
cluded from the study of phraseology. However, the few studies on the latter va- 
rieties have brought about new and exciting discoveries, to use Nettle and Ro- 
maine's phrase. 
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This article is therefore also an appeal to free phraseology, through new re- 
search approaches based on new, empirically obtained speech data, from the re- 
striction of its own choosing. This appeal has two objectives: to incorporate both 
endangered minority languages and all modes of oral languages more intensely 
than hitherto into phraseology’s research spectrum. 

Concerning the first objective: In my opinion, it is the greatest task of contem- 
porary linguistics to examine figurative and formulaic expressions of the endan- 
gered languages worldwide before they are lost forever in the near future. Figura- 
tive units of a language are vulnerable. All declining minor languages should be 
seen as cases of extreme urgency: the documentation of their idioms and formu- 
las should be started immediately when a language becomes potentially endan- 
gered, be it under the pressure of a more dominant major standard language or 
endangered by other factors. “In such a situation metaphors and figurative nu- 
ances are the first to vanish, even if the language continues to exist” (Idstróm and 
Piirainen 2012: 18). The benefit of such studies for phraseology, linguistics and 
cultural sciences is obvious. 

In the last few years, there has been a growth of research interest in linguistic 
diversity and declining minor languages; for some of them it was possible to rec- 
ord figurative elements in their originality, as shown above with individual ex- 
amples. But for other regions of the world this is an impossible task. This is par- 
ticularly evident in Aboriginal Australia. According to Evans (2007), language 
loss is more accelerated than in any other continent, with a 95 percent extinction 
rate expected to be reached during the course of the coming decades. Miller (2013: 
405) speaks of at least 228 Indigenous languages in Australia, for which much 
work remains to be done with regard to figurative language. Such investigation 
needs to be conducted sensitively and with the help of competent native speak- 
ers.? Most of these languages will become extinct without their figurative expres- 
sions being recorded. Although this situation is regrettable, other projects could 
be carried out with ease, even in previously unexplored European language vari- 
eties. 

Concerning the second objective: Phraseology research in recent decades has 
increasingly been oriented towards the written language. That was not always 
the case. Harald Burger (1979) reports on a project of the University of Zurich from 
the early days of Germanist phraseology research, when empirical data had to be 
collected in order to gain theoretical insights. Half of the data had to consist of 
written texts, and the other half of tape-recorded, oral texts. Even these initial 
research approaches yielded their own results, including the finding that idioms 


19 Personal communication (April 18, 2017) by Julia Miller, Adelaide, president of AustraLex. 
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show a great variability in oral communication, a phenomenon that was to be 
“rediscovered” decades later — by the analysis of large text corpora. ? 

Already the first works on varieties, dialects and colloquial languages 
handed down orally, including that of Luxembourg, which was distinguished by 
its dialectal origin and the predominance of the oral domain, showed deviations 
from the hitherto established theories, e.g. regarding the stability or variability of 
idioms, the so-called anthropocentrism, usage restrictions of idioms (among 
them gender restrictions which are due to certain images), as well as specific 
pragmatic functions of conventional word plays, all of which up to that time had 
not been known to this extent. These results were recorded in the articles of the 
Handbook on Phraseology (HSK, Burger et al. 2007).” 

The collection of linguistic data by means of survey research, that is, surveys 
of native speakers into their phraseological competence, proved to have an un- 
imaginable advantage over research using exclusively written language data: it 
is not possible to ask the producers of written texts about their intention, e.g. 
whether they have used an idiom ironically or jokingly. 

For some time now, phraseology, also on the part of the younger generation 
of researchers, has experienced a remarkable upswing. “New empirical data" are 
particularly in demand, which means corpus linguistic analysis. This, however, 
means not only a renunciation of all the oral phenomena of language, but also a 
further, self-imposed restriction of the linguistic material basis, not only to the 
literary language in general (the only possibility in the case of dead languages) 
but to a special kind of text, namely, the language of the press, which is currently 
dominant in the corpora of literary languages. 

In contrast, all areas of the oral use of phrasemes are completely untapped. 
In German colloquial language alone, hundreds of idioms are in circulation, 
which have mostly penetrated the regional everyday language from local dia- 
lects, but which have not been lexicographically recorded at any point. The same 
applies to figurative lexicon units of all dialects and minority languages world- 
wide. Thus, this paper should be seen as an appeal to carry out new fundamental 
empirical research and to make greater use of linguistic reality, which can be 
achieved not only on the basis of written language corpora, but also by relevant 
information provided by the members of a speech community themselves, 
thereby expanding the theoretical framework of phraseology. 


20 Examples of variability of orally used idioms are open patterns like not all X in the Y meaning 
‘not to be in one's mind’, which can even be regarded as the beginning of construction grammar 
(Burger 1979: 96-97). 

21 Cf. Moulin and Filatkina (2007); Piirainen (2007); Schmidlin (2007). 
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Stephan Elspaß 
Areal Variation and Change in the 
Phraseology of Contemporary German 


Abstract: Areal variation and change in phraseology is still a remarkably under- 
developed area of research. Past studies of areal phraseology have either been 
restricted to small localities and regions, or have required considerable effort in 
data collection. There is a complete lack of studies on phraseological change in 
contemporary German. Recent research projects on German areal linguistics have 
used internet surveys, as in the case of the Atlas zur deutschen Alltagssprache 
(AdA) ‘Atlas of colloquial German’ (with data from mostly spoken regional ver- 
naculars), or large corpora, in the case of the Variantenwörterbuch des Deutschen 
(VWB) ‘Dictionary of lexical variation in German’ and the Variantengrammatik 
des Standarddeutschen (VG) ‘Regional variation in the grammar of Standard Ger- 
man' (with data from written Standard German). These new methods can be used 
to obtain reliable data on the areal distribution of phrasemes in contemporary 
German usage with relatively little effort. Moreover, a comparison of recent AdA 
data with data from the Wortatlas der deutschen Umgangssprachen (WDU) *Word 
Atlas of colloquial German', collected in the 1970s and 1980s, can reveal devel- 
opments in the areal distribution of phrasemes on the level of colloquial speech. 
This article aims to demonstrate the potential of such research approaches for the 
study of variation and change in phraseology and will use selected examples 
from the AdA and the VG for illustration. 


1 Introduction 


Diatopic variation and change are central research issues in modern variationist 
linguistics and historical linguistics. Phraseology, however, is comparatively 
marginal in this area of research. As for German, there is relatively little research 
on diatopic variation in phraseology.' After some studies on the phraseology of 
dialects and regiolectal varieties in the second half of the twentieth century, 


1 Cf. Piirainen (2006) and Sava (2014) for overviews of the field of areal variation in the phraseo- 
logy of German. 
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mostly restricted to small localities or regions (cf. Hain 1951; Hünert-Hofmann 
1991; Piirainen 2000)? it was Elisabeth Piirainen who carried out the first major 
survey on diatopic variation in the phraseology of contemporary colloquial Ger- 
man at the beginning of the new millennium. Unfortunately, the results of her 
study, based on about 3,000 written questionnaires from Germany, are only 
available through individual publications (see e.g. Piirainen 2003, 2006, 2009a, 
2009b) rather than as a linguistic atlas of phraseology. Apart from and contrary 
to her own study, "[llinguistic geographical studies on phraseology are usually 
restricted to aspects of the [!] German pluricentricity" (Piirainen 2006: 195). Pii- 
rainen demonstrates that the distribution of phrasemes is not limited to lan- 
guages or so-called ‘national varieties’ of a language; they are sometimes distri- 
buted only in certain areas (of different sizes) within these countries, and phra- 
semes -idioms in particular — can be widespread across different languages (cf. 
Piirainen 2012). Piirainen (20092) coined the term “areal phraseology” to encom- 
pass a linguistic concept which does not limit the consideration of diatopic 
phraseme distribution to individual languages or countries. The investigation of 
regiolectal phraseology in particular thus constitutes a research desideratum.? 

Though the historical phraseology of German appears to be intensively re- 
searched today (see Friedrich 2007; Filatkina 2018 and the contributions to Fil- 
atkina et al. 2012), there is relatively little research on recent changes in the phra- 
seology of German, particularly changes in areal phraseology. 

The present article will therefore look at areal variation and change in the 
phraseology of contemporary German. Section 2 discusses some conceptual prob- 
lems and issues of variation and change in phraseology. Case studies of areal var- 
iation in German phraseology are presented in section 3. Section 4 presents two 
examples of change in the areal variation of routine formulae in recent decades, 
and section 5 concludes with a brief summary. 


2 Grober-Glück's (1974) study is an exception here, as it covers most of the German-speaking 
area. However, its primary interest is not linguistic; Grober-Glück's study on “motives and moti- 
vations in sayings and folk wisdom" is a by-product of the Atlas zur deutschen Volkskunde (‘Atlas 
of German Folklore’), a large ethnographic project conducted in Germany and Austria from 1930 
to 1935. Typically, phraseological comparisons (e.g. dumm wie ein Schaf/eine Gans/Haferstroh 
etc. ‘thick as mince/a brick (etc.)') and sayings which are prompted by extralinguistic facts (e.g. 
‘What do people say when someone's nose itches?') are mapped. 

3 In his 2018 plenary talk *Neue Wege der Regiolektforschung" (*New paths of research into 
regiolects’) at the 6th congress of the Internationalen Gesellschaft für Dialektologie des Deutschen 
(IGDD) in Marburg, Michael Elmentaler identified phraseology as one of three prominent and 
particularly rewarding research fields in the future study of German regiolects. 
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2 Areal Variation and Change in Phraseology 


For the purposes of this chapter, language variation and language change may 

be defined as follows (based on Pickl 2013: 39). 

— Language variation occurs when more than one linguistic form is used to rep- 
resent a linguistic function. 

— Language change occurs when the association between linguistic function 
and linguistic form alters over time. 


Areal variation in phraseology can thus be defined as the coexistence of different 
phraseological forms (or variants) representing the same linguistic function in a 
given area (e.g. in the German-speaking countries). 

Phraseological change manifests itself in different ways, which may^ be sub- 
sumed into three basic types: 

— Type I: Phraseme A is replaced by phraseme B over time (cf. example 1). 

— Type II: The internal structure of a phraseme variant is altered over time, ei- 
ther on a paradigmatic level (e.g. one constituent is replaced by another lex- 
ical form, cf. example 2, or by a different morphological form, e.g. a plural 
form of a noun by a singular form, cf. example 3), or on a syntagmatic level 
(e.g. a phraseme is shortened, cf. example 4, the word order is fixed, particu- 
larly in binomials, cf. example 5, or the constituents of a polylexical phra- 
seme have moved together to form a monolexical expression, cf. example 6). 

— Type III: The semantic or pragmatic function of a phraseme changes over 
time (cf. example 7). 


(1) Middle High German dá gienc ez (jmdm.) ûz deme spil > present-day German da wurde 
es (für jmdn.) ernst ‘things become serious (for sb.)’ 
(Friedrich 2007: 1095) 


(2 Early New High German (Luther) Aus den aügen, aus dem hertzen. » present-day Ger- 
man Aus den Augen, aus dem Sinn. ‘Out of sight, out of mind.’ 
(Friedrich 2007: 1101) 


(3 Middle New High German (Knigge) in Reihe und Gliedern > present-day German in Reih 
und Glied ‘in formation’ 
(Burger 2015: 147) 


4 This is a simplified typology. More elaborate classifications are presented by Friedrich (2007: 
1100-1103), Dräger (2011: 63-170) and Burger (2015: 144-157). 
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(4) Middle New High German (Goethe) seinen Platz nehmen > present-day German Platz 
nehmen ‘to take a seat’ 
(Burger 2015: 147) 


(5) Old High German (Otfrid von Weißenburg) arme joh riche / riche joh arme > present- 
day German Arm und Reich ‘the rich and the poor’. 
(Hüpper et al. 2002: 91-95) 


(6) Old High German hiu tagu > present-day German heute ‘today’. 
(Dal and Eroms 2014: 37) 


(7) Middle New High German (Goethe) im Augenblick ‘at that very moment’ > present-day 
German im Augenblick ‘now (from the speaker’s perspective)’ 
(Burger 2010: 147-148) 


In practice, the identification of phraseological variants faces several well-known 
problems. I will mention only five: 


1. 


The definition of ‘phraseme’ as inherently involving polylexicality: Some 
compound verbs in German have two orthographic variants, such as (jmdm. 
etw.) übel nehmen/tibelnehmen ‘to hold sth. against sb.’. Is the discontinuous 
variant übel nehmen a phraseological unit? 

The distinction between phraseological variant, phraseological modifica- 
tion, and phraseological error, e.g.: Is etw. unter den Tisch kehren (lit. ‘to 
sweep sth. under the table’) a modified phraseme (cf. Wotjak 1992: 171), an 
error (cf. Elspaß 2002), or by now a “canonical modified phraseological unit” 
(cf. Rodriguez Martin 2014), i.e. a variant, if it accounts for about a third of all 
occurrences in present-day German print?? 

The distinction between structurally similar phraseological synonyms (e.g. 
in letzter Minute/in zwölfter Stunde ‘at the last minute’) and phraseological 
variants (e.g. the lexical variants die Achseln/Schultern zucken and the gram- 
matical variants die Achsel/Achseln zucken, mit der Achsel/den Achseln 
zucken 'to shrug one's shoulders). 

The low frequency of many phraseological types such as rarely used idioms: 
Even competent speakers can have trouble identifying such phrasemes or 


5 According to a quick search in the Google Books Ngram Viewer, conducted in November 2018 
(cf. Pfeiffer 2017: 18-22). Stumpf (2016) points to another problem, i.e. the differentiation be- 
tween modified phrasemes and phraseme schemata (or phraseological constructions) such as X 
oder Y, das ist hier die Frage (‘X or Y, that is the question’, built on Sein oder Nichtsein, das ist 
hier die Frage ‘To be or not to be: That is the question"), e.g. Kaufen oder nicht kaufen (‘To buy or 
not to buy’) / Hart oder weich (‘Hard or soft’) / etc., das ist hier die Frage. 
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judging what their ‘normal’ form is, and sometimes frequencies are too low 
even in large corpora to establish a ‘normal’ form. 

5. The fairly limited usefulness of dictionaries in the investigation of variation 
and change of phraseological units due to either missing or unclear infor- 
mation on the underlying corpora and the lexicographic methodology. 


3 Areal Variation in the Phraseology of Present- 
Day German 


In section 2, I defined areal variation as the coexistence of different forms repre- 

senting the same linguistic function in a given area. As for areal phraseological 

variation, Piirainen (2009a: 147-152) identifies six categories of distributional 
range. For the purpose of this paper, I will present a modified version of her clas- 
sification, trimming the six categories down to four? and illustrating them with 

examples from Piirainen and from VWB (2016). 

1. Aphraseme is distributed in only a small region (sometimes within the range 
of just a few villages), e.g. the (West Low German) idiom Klumpe nao Wessum 
drddgen ‘to carry coals to Newcastle’ (lit. ‘to carry clogs to Wessum’), which 
is (or was) only known and used in the dialects of a small region around Wes- 
sum, a Westphalian village that was known for its clog craft. 

2. A phraseme is distributed within a larger area, e.g. the idiomatic saying sie 
kommt nicht aus den Sträuchern (‘she isn’t making (any) headway’), which is 
only known and used in the colloquial vernacular in Westphalia, as it is de- 
rived and translated from an idiom in the Westphalian dialects (Se kiimp nich 
uut de Striiiike). 

3. A phraseme is distributed within a standard variety of a larger region, e.g. 
das ist gehopst wie gesprungen (Northern German standard)/das ist gehupft 
wie gesprungen (Southern German standard) ‘it’s six of one and half a dozen 


6 I omitted Piirainen's category „Verbreitung im Raum eines nicht mehr existierenden Staatsge- 
biets“ (‘distribution in the area of a defunct state territory’), because it only comprises phrasemes 
from the former GDR, which may be more appropriately subsumed under the (new) category 3 
(phrasemes of areal standard German varieties), and the category, Verbreitung innerhalb des 
gesamten deutschen Sprachgebiets* (‘dissemination within the entire German-speaking area’), 
because phrasemes falling into this category do not constitute areal variants. 
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of the other’, or es ist noch nicht im Topf, wo’s kocht ‘it’s still early days’ (East- 
ern German standard, i.e. in the area of the former German Democratic Re- 
public). 

4. A phraseme is distributed within a standard variety of a country, e.g. etwas 
gebacken bekommen ‘to get sth. done’ (German Standard German, cf. section 
3.1.1 below), die Finken klopfen ‘to take to one’s heels’ (Swiss Standard Ger- 
man), or es/etw. ist zum Krenreiben ‘it’s a hoot’ (Austrian Standard German). 


This classification situates areal variation as encompassing variation in collo- 
quial German, understood as usage in (mainly spoken) dialects and regiolectal 
varieties (categories 1 and 2), as well as variation in the (mainly written) standard 
language (categories 3 and 4). I will first concentrate on variation in colloquial 
German (3.1) and then on Standard German (3.2). 

Based on Piirainen’s concept of areal phraseology, and focusing on areal var- 
iation in the phraseology of German, this investigation is guided by the following 
research questions: 

1. What does the areal distribution of phraseological variants look like in collo- 
quial German vernaculars and in Standard German? 

2. Isit possible to establish differences between awareness and actual usage? 

How do dictionaries deal with areal variation? 

4. Isit possible to establish changes in the areal phraseology of German? 


xx 


RQ 3 can be subdivided into various sub-questions: Do dictionaries account for 
certain phrasemes at all? Do they account for the areal distribution of phrasemes 
or their variants? If yes, are these accounts reliable? Do dictionaries distinguish 
between awareness and usage? Duden 11, the main phraseological dictionary of 
Standard German, and VWB, the dictionary of areal variation in Standard Ger- 
man,’ will be used to address RQ 3 and its sub-questions. 


3.1 Areal Phraseological Variation in Colloquial German 
Vernaculars 


3.1.1 Patterns of Distributional Range 


The present chapter takes the definition of ‘colloquial German vernacular’ to be: 


7 The VWB marks idiomatic expressions with an asterisk. 
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die Gesamtheit der Sprachformen, ,die Sprecherinnen und Sprecher des Deutschen in der 
Alltagskommunikation verwenden‘, also ‚im sozialen und funktionalen (‚Nähe‘-)Bereich 
des Privaten, des spontanen Gesprachs unter Freunden, Verwandten oder Bekannten oder 
auch im informellen Austausch unter nicht näher Bekannten aus demselben Ort, etwa im 
örtlichen Lebensmittelgeschäft‘. 
(‘registers and variants in everyday communication, i.e. in the social and functional do- 
mains of private life, of spontaneous speech among friends, relatives, acquaintances, or in 
informal situations among people from the same place who are not necessarily close to each 
other, e.g. in the local corner shop’.) 

(Möller and Elspaß 2019, based on Möller and Elspaß 2014: 122) 


This definition accounts for the different manifestations of everyday colloquial 
vernaculars in the German-speaking countries, which may include both dialectal 
and regiolectal varieties. In German-speaking Switzerland, Liechtenstein, and 
many regions in central and southern Germany, Austria, and South Tyrol, the 
vernacular language of everyday life is still dominated by local or regional dia- 
lects. In many other areas, however, everyday language is characterized by su- 
pra-regional varieties, such as regiolects. 

Such colloquial vernaculars are the subject of the long-term Atlas zur 
deutschen Alltagssprache (AdA) (‘Atlas of colloquial German’) project (Elspaß 
and Moller 2003ff.). In 2003, an initial online questionnaire was distributed, 
aimed at eliciting everyday language, particularly as used by the younger gener- 
ation in urban areas. The AdA is geared toward lexical variation, but also includes 
questions on morphosyntax, phonetics, and phrasemes (including routine for- 
mulae). This first survey was completed by 1,763 participants, and was followed 
by ten more surveys conducted at fairly regular intervals. Over the years, the 
number of participants snowballed; in the tenth survey, over 20,000 people pro- 
vided data. As the data were collected in a crowd-sourcing approach, it was not 
possible to control the number of responses per location (though, in spite of this 
and rather surprisingly, the overall number of responses is almost balanced for 
gender). Responses were assigned to 500 cities and towns and then aggregated 
by location. The individual maps presented below show either one or two color- 
coded dots per location. In the latter case, the bigger dot represents the most fre- 
quently reported, i.e. the dominant, variant at the location. A smaller dot next to 
the big dot indicates that there is variation at the location and symbolizes the 
second most common variant there. 

In the AdA surveys 1 to 11, data on 35 phrasemes were elicited and presented 
on 39 maps. A full list can be found in the appendix. I will present and discuss 
various examples here with regard to RQ 1. 

Figures 1 and 2 show examples of phrasemes with a small-scale distribution. 
The routine formula jmd. ist gut zufrieden ‘sb. is quite content’ (figure 1) is used 
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only in a small area in the northwest of Germany which borders the Netherlands. 
The similarity to (Standard) Dutch iemand is goed tevreden is obvious. Figures 
2a/b present variants of the German equivalents for ‘don’t take offence’ and ‘you 
have to take things as they come’: The Standard German variants are nimm’s mir 
nicht tibel and man muss es nehmen, wie es kommt. Only in a small area in the far 
west of Germany (Saarland and the western part of Rhineland-Palatinate), Lux- 
embourg, and the southern part of East Belgium (around St. Vith) are the variants 
hol’s mir nicht tibel (figure 2a) and man muss es holen, wie es kommt (figure 2b) 
common. These variants can be traced back to a general replacement of the Ger- 
man verb nehmen ‘to take’ by holen in the Moselle Franconian dialects (see RhWb 
3: 759-760); as the two examples demonstrate, this also affects phrasemes. 


gut zufrieden 


Wie geht es dir? 
Danke, ich bin gut zufrieden. 


e üblich 
e ab und zu 
* unüblich 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 1: Distribution of gut zufrieden ‘quite content’ (AdA VIII-6h) 
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nehmen / holen IV 


Nimm's/Hol's mir nicht übel! 


e nimm's 
e hol's 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


nehmen / holen V 


man muss es nehmen/holen, 
wie es kommt 


e nehmen 
e holen 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 2b: Distribution of man muss es nehmen/holen, wie es kommt ‘you have to take things as 
they come’ (AdA IX-6e) 
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None of the three variants with a small-scale distribution (gut zufrieden; hol’s mir 
nicht tibel; man muss es holen, wie es kommt) are considered Standard German, 
as they are listed in neither Duden 11 nor VWB. 

The next group of maps presents phraseological variants with a large-scale 
distribution. Figure 3 illustrates the distribution of the routine formula das geht 
sich [zeitlich] noch aus ‘this will work out (timewise)’. It is used throughout Aus- 
tria, but also in Liechtenstein, South Tyrol, and the Bavarian dialect areas of 
southeastern Germany. Again, the distribution points to a certain dialect area. 
But it is also employed in Standard German: The VWB (2016: 65) marks etw. geht 
sich aus as a Variant of Standard German in Austria and in the southeast of Ger- 
many. Duden 11, however, has no entry. 


sich ausgehen 


Ich muss noch etwas einkaufen. 
Aber dann kónnen wir uns treffen — 
das geht sich noch aus. 

e üblich 

èe manchmal 

*  unüblich 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 3: Distribution of das geht sich [zeitlich] noch aus 'this will work out (timewise)' (AdA XI- 
3a) 


The meaning of the variants in figure 4a is ‘(to do sth.) free of charge’. There are 
three main variants: für umsonst is the most widely distributed and is used as the 
standard form, für umme (a phonetically non-standard variant of für umsonst) is 
restricted to a small area in the Palatinate area in the southwest of Germany, and 
für lau - also considered ‘non-standard’ - is the dominant form in the north of 
Germany as well as the only non-standard variant mentioned in Duden 11. 
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unentgeltlich 
Der tut das fur... 

e lau 

e umsonst 

e umsonst (ohne "für") 


^ umme 
e gratis (ohne "für") 
e frei 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 4a: Distribution of variants for ‘(to do sth.) free of charge’ (AdA VIII-4n) 
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Fig. 4b: Variants for ‘(to do sth.) free of charge’ from Piirainen (2006: 217) 
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Figure 4a can be compared with a map from Piirainen (2006: 217). Piirainen’s map 
(figure 4b), however, is limited to Germany, and it is not extensional, but focuses 
on the distribution areas of three non-standard variants: für lau in the west, für 
umme in the southwest, and fiir nasse in the (central) east of Germany. There are 
three remarkable differences between the two maps. Firstly, the area on the AdA 
map for für umme is smaller and situated further north in comparison to Pii- 
rainen’s map. Secondly, the area of für lau extends much further to the north and 
to the east on the AdA map. Thirdly, and most surprisingly, the für nasse area on 
Piirainen’s map does not materialize on the AdA map at all, despite fiir nasse be- 
ing at the top of the list of four optional variants (plus one optional box for ‘other’ 
variants) in the AdA online questionnaire. As it is improbable that such a drastic 
change has occurred within one decade,’ these discrepancies are more likely to 
be due to the different numbers of informants (ca. 3,000 in Piirainen’s study vs. 
9,758 in the AdA study) and the different methods of data collection. While Pii- 
rainen targeted professional linguists and students at German departments 
throughout the country (Piirainen 2006: 210), the AdA questionnaire was directed 
at laypeople (see Möller and Elspaß 2015: 521-526 on the methodology of the 
AdA). Moreover, in contrast to the point-symbol maps of the AdA, Piirainen’s area 
map does not show the distribution of responses. 

Figures 5a and 5b map the distribution of two variants for the German phra- 
seme for ‘to get sth. done’, one of which is considered non-standard (etw. ge- 
backen kriegen) and the other standard (etw. gebacken bekommen). Clearly, the 
non-standard variant has a much wider distribution area (most of Germany, ex- 
cept the southeast) than the standard variant (the colour code on the two maps is 
as follows: Pink dots signify that the phraseme is ‘very common in use’, orange 
dots mean ‘fairly common’ and blue dots stand for ‘utterly uncommon’). 


8 Piirainen collected her questionnaire data in 2000-2001, and the AdA data for ‘(to do sth.) free 
of charge’ were collected in 2010-2011. There is also no evidence of significant age differences 
between Piirainen's informants and the AdA informants: About two thirds of Piirainen's re- 
spondents (Piirainen 2006: 210) and about half of the AdA informants were under 30 years of 
age. 
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gebacken kriegen 


Das kriege ich bis morgen 
nicht mehr gebacken. 


€ sehr üblich 
* kommt (ab und zu) vor 


* vóllig unüblich 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


gebacken bekommen 


Das bekomme ich bis morgen 
nicht mehr gebacken. 


€ sehr üblich 
* kommt (ab und zu) vor 


* vóllig unüblich 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 5b: Distribution of gebacken bekommen ‘to get sth. done [standard]’ (AdA IV-21b) 
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3.1.2 Awareness and Actual Usage of Phrasemes 


RQ 2, regarding differences between the awareness and actual usage of phrase- 
mes, is addressed by this next group of variants. Figures 6a/6b, 7a/7b, and 8a/8b 
show pairs of maps for three phrasemes: das geht dich einen Schmarren an (‘that’s 
none of your business’, figure 6a/b), dicke Backen machen (‘to brag about sth.’, 
figure 7a/b), and mit etw./jmdm. ist kein Blumentopf zu gewinnen (‘you’re not get- 
ting anywhere with this/them’, figure 8a/b). In each case, the phraseme is widely 
known, but its actual usage is restricted to a much smaller area (pink dots in fig- 
ures 6a, 7a and 8a stand for ‘the phraseme is known’ in the respective locality; in 
Figures 6b, 7b and 8b, pink dots mean ‘the phrasem is used’ in that locality. Blue 
dots signify ‘unknown’ or ‘uncommon’ respectively). 

Dictionaries like VWB and Duden 11 do not distinguish between awareness 
and usage. Usually, they only mark the areas in which the phrasemes are used 
(cf. RQ 3). In view of figures 6b, 7b, and 8b, dictionaries’ labels of areal distribu- 
tion appear to be somewhat misleading. In Duden 11, das geht Dich einen 
Schmarren an is considered a ‘southern German and Austrian’ variant, and VWB 
marks it as being used in ‘southeastern Germany and Austria'.? Dicke Backen ma- 
chen is labeled as ‘particularly northern German’ by Duden 11, though it is appar- 
ently equally employed in the southwest. And mit etwas/jmdm. ist kein Blumen- 
topf zu gewinnen is labeled as ‘particularly used in the Berlin vernacular’ by 
Duden 11, whereas VWB marks it as being used in ‘Germany and Austria’. Neither 
matches the distribution as displayed on the map (Germany, particularly north 
and central Germany). These examples illustrate yet again the fundamental prob- 
lems of lexicographic labels on the areal distribution of idioms, which Piirainen 
has pointed out repeatedly (e.g. Piirainen 2002). 


9 To be completely correct, South Tyrol should also be added. Both dictionaries consider this 
phraseme standard, although the online Duden dictionary marks Schmarr(e)n (‘rubbish, tripe’) 
as ‘colloquial’ (https://www.duden.de/rechtschreibung/Schmarren, accessed December 31, 
2018). 
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... ochmarren ... 


Was ich privat mache, geht 
dich einen Schmarren an. 
(Redewendung) 


€ bekannt 
* unbekannt 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 6a: Distribution of awareness of das geht dich einen Schmarren an (‘that’s none of your 
business") (AdA IV-20a) 


... ochmarren ... 


Was ich privat mache, geht 
dich einen Schmarren an. 
(Redewendung) 


9 gelaufig 
* nicht gelaufig 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 6b: Distribution of usage of das geht dich einen Schmarren an (‘that’s none of your busi- 
ness") (AdA IV-20b) 
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... dicke Backe(n) ... 


Gestern hat Paul noch (eine) dicke 
Backe(n) gemacht, und heute kneift er. 
(Redewendung) 


€ bekannt 
* unbekannt 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


... dicke Backe(n) ... 


Gestern hat Paul noch (eine) dicke 
Backe(n) gemacht, und heute kneift er. 
(Redewendung) 


€ geläufig 
* nicht gelaufig 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 7b: Distribution of usage of dicke Backen machen ('to brag about sth.") (AdA IV-19b) 
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... Blumentopf ... 


Mit dieser Mannschaft ist 
kein Blumentopf zu gewinnen. 
(Redewendung) 


€ bekannt 
* unbekannt 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 8a: Distribution of awareness of mit etw./jmdm. ist kein Blumentopf zu gewinnen (you're 
not getting anywhere with this/them") (AdA IV-18a) 


... Blumentopf ... 


Mit dieser Mannschaft ist 
kein Blumentopf zu gewinnen. 
(Redewendung) 


€ geläufig 
* nicht gelaufig 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 8b: Distribution of usage of mit etw./jmdm. ist kein Blumentopf zu gewinnen (‘you’re not 
getting anywhere with this/them") (AdA IV-18b) 
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3.2 Areal Phraseological Variation in Standard German 


In the present contribution, I use a definition of ‘standard language’ - or rather, 
‘standard varieties’ — that is based on the concept of a standard of usage. Thus, a 
standard variety can be defined as a variety which is commonly used in contexts 
that are perceived as standard language.'? This may include the usage of concep- 
tually written language which is widely accepted as appropriate and used in for- 
mal and public situations in any region of a language area, in this case the Ger- 
man-speaking area. As the classification at the beginning of section 3 has already 
indicated, this standard variety can cover a country or a larger region within a 
country. In this respect, I follow a model of ‘pluriareal standard languages’ rather 
than ‘pluricentric standard languages’ in the sense of ‘plurinational standard 
languages’ (see Elspaß and Dürscheid 2017: 87-89; Elspaß et al. 2017: 70-74, for 
a discussion of the different concepts). 

In this section, I first present three examples from the AdA and then three 
examples from a Master's thesis by Lisa Hóller (2016), who based her study on 
two large electronic corpora of present-day Standard German. 

Although the first three examples are taken from the 'Atlas of colloquial Ger- 
man' (AdA), the variants display variation that is also valid for the standard. 

Figure 9 shows the results for the phraseological variants eins gemerkt/eins 
im Sinn/etc., ‘to carry a digit over’, as when a number greater than 9 (in this case 
10) is transferred to the next position (in this case by adding “1” in the tens posi- 
tion). 


10 “Standard ist das, was in Kontexten, die als standardsprachlich aufgefasst werden, regelhaft 
in Gebrauch ist" (Elspaß et al. 2017: 71). 
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eins gemerkt 
eins gemerkt 
eins im Sinn 
behalte eins 


eins weiter 
merke eins 


e 
© 
e 
e 
e 
e bleibt eins 
e 
e 
e 
e 


übertrage eins 
eins herunter 


eins hochgestellt 
eins oben 


r je ogo? % : 
ee 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 9: Distribution of usage of eins gemerkt/eins im Sinn/... (lit. ‘one in memory’) (AdA I-19) 


The map reveals a clear areal distribution of the two main variants eins im Sinn, 
which is the dominant variant in the north and the west of Germany, and eins 
gemerkt, which is used in the rest of the German-speaking countries. Some no- 
ticeable regional variants are behalte eins in the southwest, merke eins in the 
(north)east of Germany, and bleibt eins or eins weiter in some parts of Austria. 
These variants are not mentioned in Duden 11. 

Figure 10 illustrates the distribution of the variants of New Year wishes in 
Standard as well as colloquial German. The distributional areas of four variants 
can basically be distinguished: frohes ‘happy’/gesundes ‘healthy’/gutes ‘good’ 
neues (Jahr) ‘new (year)’ and Prosit Neujahr (with the loan from Latin prosit ‘may 


it benefit’). Duden 11 only lists pros[i]t Neujahr, with no indication as to its limited 
areal distribution. 
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Neujahrswünsche 


in der Silvesternacht um 0:00 


* Frohes neues Jahr 


Frohes neues 


* Gesundes neues Jahr 


9. Gesundes neues 


* Gutes neues Jahr 


e Gutes neues 


Li ^. Nürnberg, © 
PONI MEL 
© @ @ 


* Prosit Neujahr 

* Prost Neujahr 

* Glück im neuen Jahr 
* Auf [Jahreszahl] 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 10: Distribution of variants of New Year wishes (AdA VIII-1a) 


Figure 11 is the only map which seems to identify a national variant, jmdn. auf die 
Schaufel nehmen (‘to pull sb.'s leg’), which in colloquial language is only used in 
Austria (though not so much in the western parts of Austria"). The almost exclu- 
sive variant in the other German-speaking countries is jmdn. auf die Schippe neh- 
men (with jmdn. auf die Schüppe nehmen constituting merely a phonetic variant). 


11 The grey dots on the map indicate that the idiom is not used at all - neither in this or the 
other variant - in many parts of Austria (and elsewhere). 
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jmdn. auf die Schippe/ 
Schaufel nehmen 


(Redewendung) 


e Schippe 
e Schüppe 
e Schaufel 

unüblich 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 11: Distribution of jmdn. auf die Schippe/(Schüppe)/ Schaufel nehmen (‘to pull sb.'s leg") 
(AdA, IX-3b) 


The map appears to confirm the information given in VWB and Duden 11, which 
labels jmdn. auf die Schaufel nehmen as an Austriacism. A search in the Austrian 
newspaper corpus of the German reference corpus (DeReKo), however, paints a 
different picture.” Out of 1,125 cases in which the idiom was used, 191 (= 17.1%) 
have the variant jmdn. auf die Schippe nehmen.” In other words, in almost every 
sixth instance, an idiomatic variant is used in Standard German in Austria which 
has no basis in the Austrian spoken vernaculars - the lexemic variant Schippe is 
alien to colloquial varieties of German in Austria, as another map from the AdA 
confirms.“ 

The following three selected results, taken from Hóller (2016), focus on areal 
standard variation in prepositions in phrasemes. Hóller's study is partly based on 


393 €€ 


12 Search strings: “(auf die Schippe) /sO &nehmen", “(auf die Schüppe) /sO &nehmen" and 
“(auf die Schaufel) /sO &nehmen”. 

13 There are no hits for jmdn. auf die Schüppe nehmen. 

14 http://www.atlas-alltagssprache.de/schaufel/, accessed December 31, 2018. 
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the corpus of the Variantengrammatik des Standarddeutschen (VG) ‘Regional var- 
iation in the grammar of Standard German’ and partly on the Deutsches Refer- 
enzkorpus (DeReKo) ‘German Reference Corpus’. 


Tab. 1a: Distribution of auf/zu Besuch fahren/kommen/sein/haben (lit.: ‘to go/come/be/have 
on (a) visit") in the VG corpus 


auf Besuch zu Besuch total auf Besuch (96) zu Besuch (96) 
D 62 2,570 2,632 2.4 97.6 
A 69 173 242 28.5 71.5 


CH 23 105 128 18.0 82.0 


Tab. 1b: Distribution of auf/zu Besuch fahren/kommen/sein/haben (lit.: ‘to go/come/be/have 
on (a) visit) in the DeReKo 


auf Besuch zu Besuch total auf Besuch (96) zu Besuch (96) 
D 1,015 41,366 42,381 2.4 97.6 
1,328 4,185 5,513 24.1 75.9 


CH 486 2,973 3,459 14.1 85.9 


Tables 1a and 1b present results for variation in the prepositions auf/zu in the 
idiom auf/zu Besuch fahren/kommen/sein/haben (lit.: ‘to go/come/be/have on (a) 
visit’). In order to provide a better overview, the results are summarized for Ger- 
many (D), Austria (A), and Switzerland (CH). With regard to the lexicographic 
representation of such standard variation, I will follow both the VWB’s (2016: 
VIII) distinction between specific vs. unspecific variants and also Farg’s (2005: 
387) distinction between absolute vs. relative variants. A specific variant is used 
exclusively in a certain country and is a shibboleth of that country, whereas an 
unspecific variant is also used in other countries (but not in all countries and re- 
gions). Absolute variants are variants which are the (almost) only variants that 
occur in a speech community, whereas relative variants are those that occur fre- 
quently in a speech community, but are not the exclusive variants in the commu- 
nity. 
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Although the VG corpus is much smaller than the DeReKo (600 million word 
forms vs. 28 billion word forms”), the two corpus searches show similar results. 
The dominant form is clearly zu Besuch ..., while the variant auf Besuch ... is rarely 
used in Germany, more common in Switzerland, and accounts for about a quarter 
of all instances in Austria. 


Tab. 2a: Distribution of nach dem/zum Rechten sehen/schauen (‘to see that everything is OK’) 
in the VG corpus 


nach dem zum total nach dem zum 

Rechten Rechten Rechten (96) Rechten (96) 

D 975 0 975 100.0 0.0 
56 1 57 98.3 1.7 


CH 20 22 44 47.6 52.4 


Tab. 2b: Distribution of nach dem/zum Rechten sehen/schauen (‘to see that everything is OK’) 
in the DeReKo 


nach dem zum total nach dem zum 

Rechten Rechten Rechten (96) Rechten (96) 

D 13,569 15 13,584 99.9 1.1 
1,100 1 1,101 99.9 1.1 


CH 512 578 1,090 47.0 53.0 


In tables 2a and 2b, Hóller's results for the distribution of nach dem/zum Rechten 
sehen/schauen (‘to see that everything is OK’) are summarized for Germany (D), 
Austria (A) and Switzerland (CH). Again, both corpora render strikingly similar 
results. On the whole, nach dem Rechten sehen/schauen is the clearly dominant 
variant. In Switzerland, however, both variants are equally frequent in use; zum 
Rechten sehen/schauen can be considered a relative Helvetism. (The VWB simply 
marks it as a Helvetism.) 


15 The numbers refer to the time of Hóller's investigation. The size of the DeReKo has almost 
doubled since. 
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The last set of tables focuses on the variation of auf dem and am, which is not 
restricted to phrasemes (e.g. Das Buch liegt auf dem/am Tisch ‘The book is on the 
table.’), but it is very noticeable in idioms such as 
— auf dem / am Laufenden sein/bleiben ‘to be/keep (oneself) up-to-date’, 

— etwas auf dem / am Kerbholz haben ‘to have something (bad) on the tally’, 
— auf dem / am Zahnfleisch gehen (‘to be on one’s last leg’). 


The variants with am are usually presented as typical of Austrian usage, and am 
is often interpreted as a contraction of an + dem (e.g. Burger 2010: 208-209). Both 
assumptions require a revision. Firstly, am in these contexts most certainly orig- 
inates in a contraction of auf + dem rather than a contraction of an + dem (cf. 
Holler 2016: 30), and secondly, as Hóller's corpus studies show, the proportion of 
am-forms varies from phraseme to phraseme, cf. table 3. 


Tab. 3: Proportion of am/auf dem in seven idioms in VG and DeReKo subcorpora ‘Austria’ (from 
Holler 2016: 53) 


phraseme % in VG corpus % in DeReKo 
am richtigen Weg sein 8 16 
am Laufenden sein/bleiben/sich halten 18 25 
am Prüfstand stehen 20 26 
etw. am Kerbholz haben 26 26 
jmdn. am falschen Fuß erwischen 27 45 
am Boden der Tatsachen bleiben 40 28 


am Zahnfleisch gehen 100 69 


Here, some differences between the results ofthe corpus search in the VG and the 
DeReKo subcorpora are noticeable. All in all, because of the sheer size of the 
DeReKo its results are certainly more reliable. (For instance, the idiom am Zahn- 
fleisch gehen appears only 5 times in the subcorpus ‘Austria’ of the VG corpus.) 
To conclude, in none of the cases presented in this section could a variant be 
identified as a specific national variant (Germanism, Austrianism, or Helvetism) 
and at the same time an absolute variant. Zum Rechten sehen/schauen appears to 
be a specific idiom variant of Switzerland, but even this Helvetism is only a rela- 
tive variant in the standard language corpora for Switzerland. Likewise, idioms 


Areal Variation and Change in the Phraseology of Contemporary German — 67 


with am are almost specific Austrianisms,'° but none is an absolute variant in the 
corpora for Austria. Eins im Sinn is a 'Germanism' in the sense that it is only used 
in Germany, but here it is also a relative variant, and, more precisely, it shows a 
clear areal distribution in the northwest of Germany only. None of the other vari- 
ants is nation-specific in Standard German. If they are absolute variants in one 
country, they are relative variants in another country. This appears to be a typical 
distributional pattern in the standard language varieties of German. From an em- 
pirical point of view, this kind of standard variation in the German-speaking 
countries may be more appropriately conceptualized as pluriareality rather than 
pluricentricity (in the sense of ‘plurinationality’, see Schmidlin 2006). 


4 Changes in the Areal Variation of Colloquial 
and Standard German 


In section 2, language change was defined as change in the associations between 
linguistic functions and linguistic forms over time. The present section investi- 
gates whether it is possible to establish changes in the areal phraseology of Ger- 
man (RQ 4). I will present two case studies of routine formulae in German. As in 
section 3, the examples are taken from atlases of colloquial German, but they also 
display variation in spoken registers of Standard German. Both case studies use 
a change in real-time framework (Chambers 2003: 212-215). More precisely, the 
findings come from a real-time panel study that uses the same questions and ba- 
sically the same methodology. In both cases, areal distribution maps of routine 
formulae are compared. The older maps are based on data collected for the 
Wortatlas der deutschen Umgangssprachen (WDU) (‘Word Atlas of colloquial Ger- 
man’) in the 1970s, with data on the ‘typical’ expression at a given location pro- 
vided by 1 or 2 informants per location. The AdA data were collected and pre- 
sented in the manner explained in section 3.1 above, representing language use 
approximately one generation later than the WDU data." 


16 Some of them are also used in South Tyrol. 

17 The hypothesis that the regional distribution of variants from colloquial language has 
changed in recent decades was first tested in Elspaß (2005). In this study, I compared the re- 
gional distribution of eleven WDU maps (nine lexical variables, two syntactic variables) from the 
first two volumes of the WDU (WDU I-II) to eleven equivalent maps from an online pilot study 
for the subsequent AdA conducted in 2002 - thus studying language change across a time span 
of c. 25-30 years. 


68 — Stephan Elspaß 


Figures 12a and 12b illustrate the distribution of variants for ‘a greeting for- 
mula which people would normally use when they enter a local shop in the after- 
noon’. The map in figure 12a is taken from the first volume of the WDU. Figure 12b 
shows the distribution of the same pragmatic variable about 25 years later, based 
on an online survey for the AdA. A comparison of the two maps reveals both sim- 
ilarities and differences, pointing partly to stability and partly to change. As for 
similarities, both maps display a north-south divide along the river Main, which 
has been identified as the main isogloss in the language geography of colloquial 
German (Durrell 1989; Möller 2003; Pickl and Pröll 2019). The dominant form 
south of the river Main is Grüß Gott (lit. (may) God greet you-SG’), with its variant 
Grüß Euch ('(may God) greet you-PL’, in South Tyrol and some parts of Austria) 
and the exclusively Swiss German form Grüezi (a phonetic variant of Grüß Euch). 
The dominant form north of the Main and along the river Rhine in the southwest 
of Germany was and is Guten Tag. The most obvious change is that in many 
places, particularly in the north of Germany, these polylexical routines have been 
joined or even replaced by the more informal monolexical Hallo. Although it ex- 
ists natively in German (as an old imperative singular form of holen ‘fetch’, cf. 
Pfeifer 2003: 500), its rapid dissemination within one generation is certainly due 
to its status as an internationalism (cf. French állo, English hello, Spanish hola, 
Dutch hello, etc.). Another striking change is the spread of moin, which is most 
probably the abbreviated form of the Low German and Frisian phraseme 
mo(oJi(e)n dag (‘(I wish you/have a) nice day’).”8 


18 Today, moin is considered a salient marker of northern German regional identity. Anecdotal 
evidence has it that moin partly owes its rapid dissemination to the adoption of the expression 
as the title of a popular morning show of a private radio channel in the 1990s (https://de.wikipe- 
dia.org/wiki/Moin, accessed December 31, 2018). 
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Fig. 12a: Distribution of variants of ‘greeting when entering a local shop in the afternoon’ 


(WDU 1-47) 


Gruß beim Betreten eines 
Gescháfts am Nachmittag 


* Guten Tag 
* Hallo 


Grüß Gott 

Moin 

Grüezi 

Grüß Euch (Sie/Ihnen) 


Servus 
Hoi 


* Guten Nachmittag/Abend 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 


www.atlas-alltagssprache.de 


Fig. 12b: Distribution of variants of ‘greeting when entering a local shop in the afternoon’ (AdA 


Il-1) 
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"m a: nn TO Abschiedsgruß unter 
1 | guten I feunden 
(ach 
wes - ^ 
' De DI u 
a tace: nho 
=> ote t 2 Ot 


oe Oe 
iic 


i o 


Grußformel: 
Verabschiedung 


... unter Freunden 


© Tschüss ©  UfWiderluege 
^  tschüssi €  Pfiadi 

€  Tschüüs € Servus 

€  Tschó € Baba 

©  Tschau ^ Bis dann 

© Ade ©  Machs gut 

^ Addi @ mach 

9 Adieu ^ Soli 

© Auf Wiedersehen mM au revoir- sali 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 13b: Distribution of variants of ‘saying goodbye after meeting friends’ (AdA X-17a) 
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Similar developments can be observed in the case of routine formulae for ‘saying 
goodbye after meeting friends’, a fairly informal situation (figures 13a and 13b). 

I will confine my discussion to the only two phrasemes on these two maps. 
Whereas the distribution of mach’s gut (lit. ‘make-SG it good’, mainly in Central 
East Germany) has remained relatively stable, the formal routine Auf Wieder- 
sehen (lit. ‘on seeing (you) again’) has almost entirely been replaced by the more 
informal monolexical variants Tschüss/Tschüüs/Tschö in the north of Germany, 
again north of the river Main, which are shortened forms of a tschüs(s)/a tschö 
and all ultimately - via French à dieu or Spanish adiós - go back to Latin ad deum 
(*God be with you', cf. also Ade in Southwestern Germany), or by Tschau (cf. Ital- 
ian ciao, from Venetian sciao '(your) servant). In present-day German, Auf Wie- 
dersehen, including its southeastern variant Auf Wiederschau(e)n (and the Swiss 
German dialect form Uf Widerluege), are restricted to more formal contexts, as 
shown in figure 13c, the distribution of routines for 'saying goodbye to customers 
when they leave a local shop'. 


Grußformel: 
Verabschiedung 


... gegenüber Kundinnen 
Auf Wiedersehen €  Tschüüs 
Auf Wiederschaun € . Tschó 
Uf Widerluege e  Pfi Gott 
Ade e  Pfiadi 
Addi e 
Adieu n 
A 


Tschau 


au revoir 


ee > 0o > o0 e 


Tschüss Soli 


Zweitmeldungen kleiner 


Atlas zur deutschen Alltagssprache 
www.atlas-alltagssprache.de 


Fig. 13c: Distribution of variants of ‘saying goodbye to customers when they leave a local shop’ 
(AdA X-17b) 


While both case studies include instances of long-term changes of form on a syn- 
tagmatic level (Type II change, e.g. a tschüss > tschüss, mo(oJi(e)n dag > moin), 
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the most striking changes are Type I changes, i.e. polylexical routine formulae 
which are apparently perceived as rather formal have been or are gradually being 
replaced by monolexical informal expressions (e.g. Guten Tag > Hallo; Auf Wie- 
dersehen > Tschiiss). 


5 Conclusion 


This paper has identified areal variation and change as a still barely-researched 
subject in the phraseology of German. Case studies on variation and change in 
colloquial German have been presented, based on data from online surveys that 
aim to elicit spoken regional vernaculars and variation in Standard German 
(AdA), as well as from a regionally-balanced corpus of Standard German (VG). 

With regard to RQ 1, the AdA has proven excellent for studying and present- 
ing the areal distribution of phraseological variants in German as well as changes 
in their areal patterns. As in the lexis of colloquial German in general (cf. Pickl 
and Próll 2019), neither traditional dialect boundaries nor contemporary political 
borders can fully account for the areal structure of phraseme variation in German. 
One of the most striking contrasts exists between the north and the south of the 
German-speaking countries, with the river Main as a prominent dividing line (see 
e.g. the salutations Guten Tag vs. Grüß Gott). 

RQ 2 asked whether it is possible to establish differences between awareness 
and actual usage. Informants were asked (i) whether a certain phraseme is com- 
monly known in their local town and (ii) whether it is commonly used. As ex- 
pected, the three case studies showed that the phrasemes under investigation are 
widely known, but that their usage is restricted to a smaller area (see e.g. das geht 
dich einen Schmarrn an ‘that’s none of your business’, which is known in all Ger- 
man-speaking countries, but is actively used only in Bavaria, Austria and South 
Tyrol.) 

As for the representation of areal variation in dictionaries (RQ3), Piirainen 
already demonstrated in a number of essays that areal tags, even in phraseologi- 
cal dictionaries, are often sketchy and sometimes plainly incorrect. In many in- 
stances, dictionaries label areal variants of Standard German as simply ‘collo- 
quial' or ‘regionally used’, implying that they are non-standard. In the present 
paper, it has emerged that the tags given by VWB, a dictionary of areal variants 
in Standard German, are often more precise than those of Duden 11, the most 
prominent phraseological dictionary of German. The VWB, however, often does 
not differentiate between absolute and relative variants (see e.g. the figures for 
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zum Rechten sehen/schauen ‘to see that everything is OK’ or jmdn. auf die Schaufel 
nehmen ‘to pull sb.’s leg’ in section 3.2). 

Finally, a comparison of maps from two linguistic atlases of colloquial Ger- 
man, WDU and AdA, has revealed changes in the areal phraseology of German 
which have occurred in recent decades (RQ 4). In the case of the routine formulae 
examined here, the major finding was that polylexical formal routines have grad- 
ually been replaced by monolexical informal expressions (see e.g. Guten Tag » 
Hallo). 

In methodological terms, the case studies have demonstrated the potential 
of both online surveys and corpus studies for gaining new insights into the areal 
variation of phrasemes and their change in German. Such new methods can help 
to advance this relatively young field of research, which Elisabeth Piirainen aptly 
coined 'areal phraseology' and to which she contributed some of her pioneering 
work. 
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Appendix: Phrasemes in AdA 


Survey # Variable 

1 eins gemerkt/eins im Sinn/... (Question 19) 

2 Gruß beim Betreten eines Geschäfts am Nachmittag (Question 1) 
Antwort auf „Danke“ (Question 2) 

3 etw. ist nicht nötig/etw. braucht's nicht (Question 4b) 

Ach komm!/Ach geh! (Question 5d) 

4  mitjmdm. ist kein Blumentopf zu gewinnen (Awareness) (Question 18a) 
mit jmdm. ist kein Blumentopf zu gewinnen (Usage) (Question 18b) 
(eine) dicke Backe(n) machen (Awareness) (Question 19a) 

(eine) dicke Backe(n) machen (Usage) (Question 19b) 

Das geht Dich einen Schmarren an (Awareness) (Question 20a) 
Das geht Dich einen Schmarren an (Usage) (Question 20b) 

Das krieg ich nicht gebacken! (Awareness) (Question 21a) 

Das krieg ich nicht gebacken! (Usage) (Question 21b) 

7  Neujahrswiinsche - in der Silvesternacht um 0:00 Uhr, wenn man auf das neue Jahr an- 
stößt (Question 1a) 

Neujahrswünsche am 1. Januar (Question 1b) 
Wunsch in den Tagen vor dem 1. Januar (Question 1c) 


10 


11 
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dieses Jahr/heuer (Question 4d) 

‘unentgeltlich’ (Question 4n) 

gutzufrieden (Question 6h) 

jmdn. auf die Schippe/Schüppe/Schaufel nehmen (Question 3c) 

Verbreitung von sich ausgehen: Das geht sich noch aus. i.S.v. ‘Es ist noch genug 
Geld/Zeit da' (Question 7g) 

einen Purzelbaum machen/schlagen/schießen (Question 8c) 

‘Donnerstag vor dem Rosenmontag’ (Question 1) 

Verabschiedung unter Freunden (Question 17a) 

Verabschiedung gegenüber KundInnen (Question 17b) 

Kalter Hund/ Schwarzer Hund)... (Question 1i) 

hin ... zurück/ hinzu ... rückzu/ ... (Question 2g) 

Erwiderung auf Schönes Wochenende! (Question 2h) 

Verbreitung von sich ausgehen: Das geht sich (noch) aus. [Wdh.] (Question 3a) 
Verbreitung von Du kannst doch nicht einfach hingehen und x tun! (Question 3b) 
Verbreitung von Das war doch sowas von albern/dumm ...! (Question 3e) 

Verbreitung von Das gehört geändert! (Question 3f) 

Verbreitung von „Der kann das ab.“ (Question 3i) 

20 geradeaus/genau/... (beim Zahlen) (Question 6a) 

Haben Sie noch einen Weg?/ (etwas) in der Nähe zu tun/erledigen ? (Question 6b) 

Das ist mir egal/gleich/wurscht!/Das kommt nicht drauf an. (Question 6c) 

Wunsch beim Essen im Restaurant: Guten Appetit! /Einen guten!/Mahizeit/... (Question 6d) 
Wunsch beim Essen in der Kantine: Guten Appetit!/Einen guten!/Mahizeit/... (Question 6e) 
aufpassen wie ein Haftelmacher/Heftelmacher/Heftlimacher/Schießhund/Luchs;/... (Ques- 
tion 6f) 
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An Analysis of Basque Collocations Formed 
by Onomatopoeia and Verbs ina 
Translational Corpus of Literary Texts 


Abstract: Collections of Basque proverbs and idioms have been compiled since 
the 16th century, but it was not until recently that interest in the study of phrase- 
ological units (PU) has arisen among researchers from different fields. This paper 
intends to analyze the use of a special type of phraseological unit — collocations 
formed by onomatopoeia and verbs - from a translational perspective in the lan- 
guage combination of German-Spanish-Basque. For that purpose, some introduc- 
tory remarks will first be given regarding the realities of the Basque language and 
research on Basque phraseology. Then, the object of study will be presented, and 
the importance of Basque onomatopoeia will be highlighted. Since this is a cor- 
pus-based study, details on the compilation of the digitized, parallel, and multi- 
lingual corpus will be outlined, and it will be shown how these types of colloca- 
tions were extracted from such a corpus in a semi-automatic way. In the 
translation analysis, it will be shown how one translation option stands out and 
how different factors (indirectness, for instance) influence Basque translations. 


1 Introduction 


The Basque language, spoken in the Basque Autonomous Community, Navarre, 
and the three provinces (Labourd, Basse-Navarre, and Soule) located within the 
French department of the Pyrénées-Atlantiques, is a minority language, not only 
due to the small number of speakers, but also because of the power relationships 
between the languages that coexist (i.e., the relationship between Spanish/ 
French and Basque is not an equal, but a diglossic one), and this influences, 
among other things, translation activity in the Basque Country. The number of 
indirect translations made through Spanish versions instead of original (in this 
case, German) versions is, for instance, an obvious consequence, as will be 
shown later in this paper. At the level of language, Altzibar et al. (2011: 2) also 
mention that *Basque is currently undergoing a process of unification, standard- 
isation, and adaptation to new uses. However, the influence of Spanish and 
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French, especially through calque, is having an almost decisive influence on this 
process of renovation.” 

Indeed, “[iln the last 30 years there has been an intense normalization effort 
in defining a unified form of Basque and in modernizing the language, as well as 
in extending its use from everyday affairs to high culture and science” (Uribarri 
2011: 248). Although Basque has existed for centuries in many different dialects, 
the standard form, Euskara Batua, has a very short history of around 50 years, 
and its written literary tradition is short as well. The academic research on trans- 
lation from/into Basque is very recent (Barambones Zubiria et al. 2015: 123); alt- 
hough authors have been compiling collections of Basque idioms and proverbs 
since the 16th century, not many researchers have devoted time and effort to the 
scientific study of phraseological Basque units. 

From a contemporary and academic perspective, Aierbe (2008) analyzed the 
translation of phraseological units into Basque in administrative texts. An inter- 
esting conclusion she draws is the lack of fixation on the Basque language and 
phraseology, not only in the standard language, but also in specialized texts due 
to the fact that, as mentioned before, standard Basque was only recently estab- 
lished and because of the sociolinguistic situation of the language. 

It has also been mentioned that another consequence of the diglossic situa- 
tion of the Basque language is the creation and use of calques from other lan- 
guages, namely Spanish and French. Some of these calques are collected on the 
webpage Kalkoen Behatokia’ (“Observatory of Linguistic Calques"), where there 
is a special section on phraseological calques. During the last few years, members 
of this project have published papers on the use of phraseological calques in the 
Basque media (Alberdi et al. 2011), the use of collocations in the media (Altzibar 
2004), Basque collocations (Altzibar et al. 2011), and Basque idioms (Altzibar and 
Bilbao 2016). 

In the field of Natural Language Processing, within the IXA research group at 
the University of the Basque Country, authors such as Urizar (2012), Gurrutxaga 
(2014) and Ifurrieta et al. (2016) have worked on the automatic processing of 
Basque idioms and collocations. 

Within Translation Studies, the author of this paper presented her PhD thesis 
(Sanz-Villar 2015) on the translation of phraseological units in the language com- 
bination of German-Spanish-Basque based on a parallel and multilingual corpus 
of literary texts. The analysis was limited to specific somatic phraseological units 
and binomials. The aim of the present article is to continue contributing to the 
field of Basque phraseological research from a translational perspective. For that 


1 http://www.ehu.eus/en/web/eins/kalkoen-behatokia, accessed March 28, 2018. 
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purpose, in the following section, the object of study will be presented: a specific 
type of phraseological unit, that is, Basque collocations formed by verbs and on- 
omatopoeia, i.e. words with a more or less direct relationship between sound and 
meaning. Then, from a methodological perspective, the focus will be on the com- 
pilation of the corpus from which these collocations were extracted, and on the 
automatic extraction of these patterns. Once the collocations to be analyzed are 
selected and extracted, the translation analysis will be conducted and the author 
will conclude with some final remarks. 


2 Basque Collocations Formed by Onomatopoeia 
and Verbs 


Two approaches are usually distinguished when defining the boundaries of phra- 
seology: the linguistic or phraseological approach and the statistical approach 
(Gurrutxaga 2014: 15). The former understands phraseology as a continuum with 
fixed phraseological units at the one end and flexible constructions at the other. 
The relation between the elements that constitute the phraseological unit is de- 
termined by their syntactic relationship and not the distance between them. Ac- 
cording to Bernardini (2007: 1), “/p/hraseological approaches attempt to tell col- 
locations apart from free combinations on the one hand, and from other lexical 
restriction phenomena on the other.” At the same time, within the statistical ap- 
proach, distance or window span plays a very important role as well as the co- 
occurrence between the elements. As stated by Moon (1998: 26), “[clollocation 
typically denotes frequently repeated or statistically significant co-occurrences, 
whether or not there are any special semantic bonds between collocating items." 
Word combinations are extracted from corpora and can be located in a continuum 
that goes from Sinclair's (1987) idiom principle to the open choice principle. Here, 
the concept of collocation is central because collocations are more frequent in the 
use of language than idioms. 

Following the linguistic or phraseological approach, Altzibar et al. (2011) pre- 
sented a taxonomy of Basque collocations (the first and only one to the author's 
knowledge) based on the well-known classification proposed by Corpas Pastor 
(1996) for Spanish. According to their morphosyntactic structure, Basque collo- 
cations are divided into three main classes: noun-based, adjective-based, and ad- 
verb-based. At the same time, each of them is made up of several subclasses. 
Among the adverb-based collocations, in the framework of this study the focus 
will be on one specific subclass, namely the type of collocations that are made up 


82 — Zuriñe Sanz-Villar 


of an instance of onomatopoeia (in adverbial function) and a verb, such as dinbi- 
danba jo |to hit repeatedly, to fire a shot, to toll a bell], zanga-zanga edan [to drink 
with large gulps or with great desire], or tipi-tapa joan [to march/walk step by 
step], as exemplified in Altzibar et al. (2011: 10)? A conclusion drawn by the au- 
thors of the paper indicates that “an important number of the morphosyntactic 
patterns of Basque collocations are different to the Romance languages of the re- 
gion” (Altzibar et al. 2011: 11). The collocation type “onomatopoeia + verb" is 
mentioned as one of those different patterns. Apart from this feature, it is worth 
mentioning that there is a large number of them in not only everyday but also 
educated language. 

According to Ibarretxe-Antufiano (2006), the linguistic study of onomato- 
poeia has long been neglected. However, studies on onomatopoeia carried out in 
different languages have shown that they constitute an essential part of a lan- 
guage and that they are of considerable importance in terms of quantity.? Basque 
onomatopoeia are characterized by three features: total or partial reduplication, 
the use of unusual phonological and prosodic elements (for instance, -dz), and 
the association of certain sounds with certain meanings (Ibarretxe-Antufiano 
2006: 151). According to Ibarretxe-Antufiano, the former feature does not appear 
very often in other European languages, but it is one of the most used strategies 
in Basque (not only with regard to onomatopoeia but also as a mechanism to ex- 
press emphasis). 

From a morphosyntactic point of view, different grammatical categories can 
be found (nouns, adverbs, verbs, adjectives, interjections) among Basque ono- 
matopoeia, and, due to the fact that in Basque itis quite easy to create new words 
through derivation and composition, itis not unusual for new words to be created 
based on onomatopoeia (Ibarretxe-Antunano 2006: 153). Semantically, they are 
mainly used in the following semantic fields: actions and activities, animals, 
plants, atmospheric phenomena, musical instruments, physical and mental fea- 
tures, tools, things, child language, large quantities, nature, and sexual terms. 
The first group is the largest among Basque onomatopoeia, and is divided into 


2 Ibarretxe-Antufiano (2006: 153), making reference to Etxepare (2003), refers to this type of col- 
location as “complex predicates” and adds that they are very common in languages with ono- 
matopoeia. She also mentions that the verbs accompanying the onomatopoeia are usually *dum- 
my verbs", i.e., verbs such as (to) make, (to) say, (to) think, and so on. Therefore, it is not the 
verb but the onomatopoeia that provides the real meaning of the construction (Ibarretxe-Antu- 
nano 2012: 150-151). 

3 Referring to Basque, Schuchardt (1925: 18), for instance, says that “das Baskische ist sehr reich 
an deutlichen Schallworten”, emphasizing the abundance of onomatopoetic constructions in 
Basque. 
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different categories: motion, communication, light, sound, beverage/food, de- 
struction, hitting, boiling, emotions, body functions, and others (Ibarretxe-Antu- 
fiano 2012: 152-160). 

The main limitation of Ibarretxe-Antunano and Martinez Lizarduikoa (2006) 
is that it does not reflect the real use of these constructions, the onomatopoeia, 
since all the examples are extracted from different dictionaries and collections 
containing Basque multiword expressions.^ Ibarretxe-Antufiano also concludes 
that studies regarding the translation of onomatopoeia constitute a field that re- 
quires further research in the future (Ibarretxe-Antunano 2012: 171). This paper 
intends to make a contribution to filling this gap by analyzing the use of colloca- 
tions formed by onomatopoeia in children's and adult literature texts translated 
from German into Basque. 


3 Compilation and Features of the Corpus 


In order to conduct such an analysis, it was first necessary to compile a corpus 
consisting of German-into-Basque literary translations. For that purpose, the fol- 
lowing steps were taken: the creation and description of a catalog consisting of 
literary texts that have been translated in the language combination German- 
Basque, specification of the criteria for selection of the texts that would be part of 
the corpus, and (once the texts were selected) digitization, cleaning, tagging, 
aligning, and uploading the texts to a database. 

The catalog, known as Aleuska, which was updated until 2013, includes 710 
entries that can be divided into different text types: theater (1%), essay (8%), 
adult literature (8%), poetry? (20%), and children's literature (63%). The first de- 
cision in the selection of the texts was based on this distribution: Children's liter- 
ature (CL) texts and adult literature (AL) texts were included in the corpus on the 
basis of their representativeness in the catalog. The chronological factor was an- 
other aspect that was taken into account when defining the selection criteria. In 
the catalog, as far as AL and CL texts are concerned, the number of translations 
published starts increasing from the year 1980 onwards. For this reason, the texts 
included in the corpus are translations published from that year on. In addition, 


4 However, in the paper from 2012, Ibarretxe-Antunano includes a section about the pragmatic 
functions of Basque onomatopoeia, and in this section the reader can find how these construc- 
tions are used in different contexts, such as oral literature, poetry, comics, and advertising. 

5 Regarding poetry, it has to be mentioned that many entries in the Aleuska catalog consist of 
single poems. 
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the CL-AL sub-catalog includes texts by 126 different German authors and 127 dif- 
ferent translators. Thus, source and target author diversity was another criterion 
to be considered when selecting the texts. The final factor that needs to be men- 
tioned is the mode of translation; in other words, the fact that the CL and AL texts 
of the Aleuska catalog were identified as direct translations made from the Ger- 
man source text or as indirect translations carried out from an intermediary text 
(most of the time the Spanish version) was of great importance (not only for the 
compilation of the corpus but also for the translation analysis). 

To sum up, the corpus is made up of 24 CL texts and 24 AL texts, which rep- 
resents around 3.5 million words. As for author diversity, works from 30 different 
German authors and 28 different translators were selected; regarding the mode 
of translation, there are 34 direct translations and 14 indirect translations. With 
reference to this last feature, it is important to bear in mind that they should be 
regarded as assumed direct and indirect translations, since it was not always easy 
to obtain this information from the catalog. 

For the compilation of the corpus, a tool called TAligner was used. The latest 
versions of this program were developed within the TRALIMA/ITZULIK research 
group? at the University of the Basque Country, and one of its strong points is that 
it allows the simultaneous alignment of not only two but various texts. This op- 
tion was an indispensable condition for the compilation of the present corpus— 
indispensable in order to be able to align the indirect translations with their 
source and intermediary texts. 

After all the bi- and tri-texts had been digitized, they were cleaned, tagged, 
and aligned. It can clearly be observed in Figure 1’ that the tool contains all these 
functions—limpiar (clean), etiquetar (tag), and alinear (align)—and that it allows 
the simultaneous alignments of numerous texts, three in this case: 


6 The website of the research group can be accessed via this link: https://www.ehu.eus/en/ 
web/tralimaitzulik/home, accessed March 28, 2018. 
7 Figures 1 and 2 represent the latest version of the tool, known as TAligner 3.0. 
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Fig. 1: Functions of the program TAligner 3.0 


Once all the texts are aligned, the user can make queries by using the option Con- 
sultar Corpus. The example in figure 2, for instance, shows the search for the 
Basque collocation kar-kar egin [to laugh] in the corpus. In this case, the author 
was querying the whole corpus, but since metadata, such as author’s name, 
translator, genre, and so on, was introduced when tagging the texts, the queries 
can be limited to words or word combinations in certain texts translated by cer- 
tain translators or texts written by certain authors, just to mention some possibil- 
ities. 


ner - Narrative - = -— 
Q-u-£-0--0-—-$—-B—D- 
Consultas corpuslil 
Scr corpus Mi dili 


Code Text 


(12102) (12102) 


Fig. 2: Querying the corpus with the program TAligner 3.0 


4 Extraction of Potential Collocations 


The aim is to extract specific grammatical patterns from the corpus that may con- 
stitute collocations. For this purpose, it is necessary to tag the corpus at a part-of- 
speech (POS) level. This task was performed with the help of a Natural Language 
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Processing tool called IXA pipes? and developed at the University of the Basque 
Country by members of the IXA research group (Agerri et al. 2014). Although this 
tool allows texts to be tagged at different levels, the annotation options used in 
the present case were tokenization, lemmatization, and POS tagging. As for the 
process, only the TXT files of each text need to be annotated—the Basque target 
texts in this case—and the tool will provide the user with an output file in NAF 
format for each file that is tokenized, lemmatized, and tagged at the POS level. 

To conduct the extraction of the potential collocations, a toolkit called Foma 
(Hulden 2009) was employed. Created by Mans Hulden, it is a free and open- 
source tool, and may be used to satisfy different goals within Natural Language 
Processing. For the purposes of the present study, a code to extract user-defined 
grammatical patterns based on the information of the NAF tagged files was de- 
fined? and then processed with Foma. 

The number of patterns extracted was very large (48,296), a result which was 
expected considering the widespread use of these patterns as well as the size of 
the corpus. However, advantage was taken of a special feature of the Basque on- 
omatopoeia under study: The fact that they contain a hyphen in the onomato- 
poeia part of the word combination helped reduce the initial list to 2,617 patterns. 
Then, with the help of a number of Unix commands (sed, egrep, and so on) and 
some manual work, unnecessary combinations were removed” and a list of 428 
patterns was compiled. Many of those combinations were partial or total redupli- 
cations that do not constitute onomatopoeia. As mentioned in Ibarretxe-Antu- 
fiano (2012: 138), reduplications, constituting onomatopoeia or not, are a fre- 
quently employed resource in Basque. From the 428 patterns, 147 different types 
were manually identified as potential collocations. 

Since many of those 147 patterns occur in the corpus several times, it was 
decided to limit the study to those patterns that, according to the semantic clas- 
sification mentioned in section 2, represent the largest group among Basque on- 
omatopoeia: the ones that describe the semantic field of actions and activities. 
All in all, 66 types and 162 tokens were selected and queried for the subsequent 
translation analysis. 


8 http://ixa2.si.ehu.es/ixa-pipes/, accessed March 28, 2018. 

9 [ hereby thank the member of the IXA research group Inaki Alegria for helping me to write the 
code. 

10 Many of them constituted compounds, such as gutun-azala ireki [to open the envelope] and 
were easy to identify and exclude from potential collocations. Many other results indicated that 
unnecessary spaces were added between the constituents of a compound, and thus, only a part 
of the compound was extracted. For instance, in monopoly-jolastu [to play Monopoly], the word 
game or something similar is missing. 
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The ascription of semantic fields to the patterns found in the corpus was not 
always straightforward. While in some cases there was no doubt about the cate- 
gory a specific collocation belonged to (for instance, mauka-mauka jan, dir-dir 
egin"), other cases (such as, plisti-plasta egin”) were more controversial. Be that 
as it may, itis worth mentioning that the largest semantic field, in terms of occur- 
rences, was the field expressing motion, with 44 occurrences. In fact, as for the 
group of onomatopoeia describing motion, Ibarretxe-Antunano (2012: 153) men- 
tions that it is one of the largest in Basque: "El grupo de las onomatopeyas que 
describen el movimiento es uno de los más numerosos en euskera." 

Analyzing the verbs that accompany the onomatopoeia, it may be concluded 
that the most usual by far is the above-mentioned *dummy verb" egin [to do; to 
make] with 88 occurrences, followed by the verb joan [to go] with 14 occurrences. 
Thus, in the case of those dummy verbs, as mentioned in Ibarretxe-Antufiano 
(2012: 150-151), the onomatopoeia is the component providing the real meaning 
of the construction. 


5 Translation Analysis 


During the extraction of the collocations and the translation analysis, the main 

focus was on the target texts and culture, which is in line with the target-oriented 

approach established within Descriptive Translation Studies (Toury 2012). In 

other words, the queries were made based on the Basque target texts, and thus 

the Basque translations and their cultural context served as the starting point. 

Then, the target texts' outputs were compared to their German (and Spanish) 

source texts, and the following translation options were identified: 

— No PU-Collocation: The counterpart of the Basque collocation is not a PU in 
the German source text; it is usually a single verb. 

— A Collocation-Collocation: A German collocation has been translated with an- 
other collocation in the Basque target text. 

—  Idiom-Collocation: The German equivalent of the Basque collocation is an id- 
iom. 


11 Mauka-mauka jan means ‘to eat voraciously, greedily’ and dir-dir egin ‘to shine; to sparkle; 
to gleam; to glitter', and they clearly belong to the semantic groups beverage/food and light. 

12 In plisti-plasta egin, the word plisti-plasta is the onomatopoeia of splashing water. Due to the 
sound created through the action of splashing water, it was ascribed to the semantic group of 
sound, but it may also belong to the semantic field expressing motion. 
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— @-Collocation: There is no counterpart in the German source text for the 
Basque collocation; the Basque collocation is an addition. 


The examples in table 1 may serve for a better understanding of each of the trans- 
lation options explained above: 


Tab. 1: Examples of the different translation options found in the corpus 


Source Text Target Text Translation Option 


1. Ein gigantischer Fluss schlangelt 
sich behábig seinen Weg. (HLJde)? 
[A gigantic river meanders its way 


Ibai ikaragarri handi batek bere 
bideari jarraitzen dio astiro-astiro, 


sigi-saga eginez. (HLJeu)? [A gigan- 


No PU-Collocation 


slowly.^] tic river follows its way slowly, zig- 


zagging.] 


Collocation-Collo- 
cation 


2. [...] Franz geht mit kleinen Schrit- 
ten und weiß: (BAde)'ó [Franz walks 
in small steps and knows:] 


[...] Franz tipi-tapa doa eta badaki: 
(BAeu)" [Franz walks pitter-patter 
and knows:] 


3. Das schwammige Weib lachte Idiom-Collocation 
aus vollem Hals. (BAde) [The 


spongy woman laughed loudly.] 


Andre puztuak kar-kar egin zuen 
barre. (BAeu) [The swollen woman 
laughed loudly.] 


Ikasleon artean ere badabil zerbait, @-Collocation 
airetik eta hutsetik bolo-bolo 

zabaltzen diren egunkarietako hizki- 

mizkien antzekorik. (JGeu)? [There is 

also something among us pupils, 

similar to the newspaper gossip that 

extends everywhere, plucked out of 


the air and nothing.] 


4. Es gibt auch unter uns Zóglingen 
etwas wie einen aus Luft und Nichts 
herausgegriffenen Zeitungen- 
klatsch. (JGde)? [There is also 
something like newspaper gossip 
plucked out of the air and nothing 
among us pupils.] 


13 Knister (2003): Hexe Lilli auf der Jagd nach dem verlorenen Schatz. Würzburg: Arena. 

14 All English translations are by the author of the paper. They are intended to show the literal 
meaning of the examples. 

15 Knister (2003): Kika Supersorgina altxorraren bila. Bilbao: Gero (translator: Iñaki Aristondo). 
16 Dóblin, Alfred (1929): Berlin Alexanderplatz. Berlin: Fischer. 

17 Dóblin, Alfred (1929): Berlin Alexanderplatz. Berlin: Fischer (translator: Anton Garikano). 

18 Walser, Robert (1909): Jakob von Gunten. Berlin: Cassirer. 

19 Walser, Robert (2005): Jakob von Gunten. Donostia: Erein (translator: Edorta Matauko). 
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The first example shows how the equivalent of the German verb sich schlüngeln 
(‘to meander’) and the Spanish verb serpentear is a collocation made up of the 
onomatopoeia sigi-saga [zigzag] and the verb egin [to make; to do]. The fact that 
a reduplication (astiro-astiro, ‘slowly’) appears in the same sentence in Basque 
makes this example even more interesting because it shows (together with other 
examples that were found in the corpus but cannot be mentioned in this paper 
due to space constraints) that not only onomatopoeia but also reduplications are 
a widespread resource in Basque, as mentioned in section 2. The translation of 
the German collocation mit kleinen Schritten gehen |to walk in small steps], as 
identified in the online collocation dictionary compiled by Hacki Buhofer et al. 
(2014), is the Basque collocation tipi-tapa joan [to walk pitter-patter]. In the third 
example, the German idiom aus vollem Hals [loudly] is translated with the Basque 
collocations kar-kar barre egin |to laugh loudly]. In the last case, it can be ob- 
served that there is no counterpart for the Basque onomatopoeia bolo-bolo [eve- 
rywhere| and that it rather serves to intensify the meaning of the two phraseolog- 
ical units aus dem Nichts [out of nothing] and aus der Luft gegriffen sein |some- 
thing that is plucked out of the air]. 

The distribution of the above-mentioned translation options across the dif- 
ferent subcorpora— children's literature direct translations (CL DI), adult litera- 
ture direct translations (AL DI), children's literature indirect translations (CL 
INDI), and adult literature indirect translations (AL INDI)—is presented in tables 
2 and 3. Before giving any interpretation of the figures, it is important to mention 
that the different subcorpora are not equal in size. The largest is AL DI (1,708,825 
words), followed by CL DI (809,301 words); AL INDI (547,773 words) and CL INDI 
(463,634 words) are significantly smaller compared to the former. Therefore, it 
cannot be concluded that more collocations formed by onomatopoeia and a verb 
have been extracted from the AL DI subcorpus. Rather, the main conclusion that 
can be drawn from the figures in tables 2 and 3 is the predominance of the trans- 
lation option No PU-Collocation in all subcorpora, and consequently the sporadic 
occurrence of the rest of the translation options. 


20 http://www.kollokationenwoerterbuch.ch/web/, accessed March 28, 2018. 
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Tab. 2: Distribution of translation options across subcorpora (raw numbers) 


CL DI AL DI CL INDI AL INDI Total 

No PU-Collocation 46 72 16 10 144 
Collocation-Collocation 2 6 1 0 9 
Idiom-Collocation 1 4 1 1 7 
Q-Collocation 0 1 0 1 2 
49 83 18 12 162 


Tab. 3: Distribution of translation options across subcorpora (in percentages) 


CL DI AL DI CL INDI AL INDI Total 
No PU-Collocation 93.88 86.75 88.89 83.33 88.89 
Collocation-Collocation 4.08 7.23 5.56 0.00 5.56 
Idiom-Collocation 2.04 4.82 5.56 8.33 4.32 
Q-Collocation 0.00 1.20 0.00 8.33 1.23 


100.00 100.00 100.00 100.00 100.00 


Next, it is intended to go beyond these general figures and focus on the nuances 
of the different translation options as well as other interesting features observed 
during the translation analysis. 


5.1 No PU-Collocation 


Despite the undeniable predominance of this translation option, it is necessary 
to mention that itis not a homogeneous group and that it deserves more detailed 
attention. In some cases, for instance, there is no phraseological unit in the Ger- 
man source text, but instead verbs with a more or less evident onomatopoetic 
origin can be found, such as schwabbeln [to wobble], plappern [to chatter], kitzeln 
[to tickle], plantschen [to splash], scheppern [to rattle], brummen [to grumble], 
knurren [to growl], and so on. In the first example, extracted from an adult litera- 
ture text, to express that Mieze is chattering, in the original text the verb plappern 
is used, and in the Basque text the onomatopoeia tar-tar, together with the 
dummy verb esan [to say], is employed. The second example is from a children's 
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literature text, and the Basque onomatopoeia mar-mar egin was selected as a 
counterpart of the German onomatopoetic verb brummen. 


Tab. 4: The use of verbs of onomatopoetic origin in the source texts 


Source Text Target Text 


1. Und Mieze sitzt auf, faßt ihren Franz um und Eta Miezek bizkarra tentetu, besarkatu bere 
sieht ihm wonnig ins Gesicht und plappert so Franz eta begiratzen dio aurpegira bozkariotsu 
lauter süßen Quatsch und bettelt und bettelt: eta tar-tar esaten dizkio sekulako txorakeria 
(BAde) [And Mieze sits up, takes her Franz and goxoak eta arren eta arren: (BAeu) [And Mieze 


looks him in the face with delight and chat- stiffens her back, hugs her Franz and looks at 

ters, saying nothing but sweet nonsense and his face with joy and does not stop telling him 

begs and begs.] some incredible sweet nonsense and begs and 
begs.] 


2. “Ja, ja”, brummte Tobi, der sich mindestens “Bai, bai", mar-mar egin zuen Tobik, zeini 

fünf Jahre zu alt für eine Holzeisenbahn fühlte. zurezko trenekin jostatzeko gutxienez bost 

(SLHde)?^ [*Yes, yes,” grumbled Tobi, who felt urte zaharregia zela iruditzen zitzaion. 

at least five years too old for a wooden train.] ^ (SLHeu)? [“Yes, yes,” grumbled Tobi, who felt 
at least five years too old to play with a 
wooden train.] 


On other occasions, there is not just one verb, but two (of onomatopoetic or non- 
onomatopoetic origin) in the German source text, and the translators decided to 
use a collocation with onomatopoeia in the Basque target texts. Two different ex- 
amples can be found in table 5: In the first example, the meaning of the German 
verbs rappeln und rattern |to rattle] is represented by the Basque word combina- 
tion triki-traka ibili [to clatter], while in the second example, the above-men- 
tioned verb plappern is repeated twice. The Basque translator decided to tripli- 
cate? the onomatopoeia by using tar-tar-tar, thus emphasizing that she does no- 
thing but chatter. 


21 Sommer-Bodenburg, Angela (1991): Schokolowski. Lustig ist das Hundeleben. Munich: Ber- 
telsmann. 

22 Sommer-Bodenburg, Angela (1996): Txokoloski. Dibertigarria da txakurren bizitza. Bilbao: 
Desclée de Brouwer (translator: Edurne Azkue). 

23 Ibarretxe Antuñano (2012: 138) refers to this structural feature as triplicación total or total 
triplication. 
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Tab. 5: The use of two verbs of onomatopoetic origin in the source texts 


Source Text Target Text 


1. Neben ihm rappelte und ratterte schon der Ondoan konpresorea triki-traka zarataka ze- 
Kompressor. (JVde)** [Beside him, the com- bilen jada. (JVeu)? [Beside the compressor was 
pressor was already rattling.] already clattering.] 


2. Sie plappert, plappert, legt den Kopf um Tar-tar-tar ari da, jarri du burua mutilaren 
seinen Hals, [...]. (BAde) [She chatters, chat- lepoan, [...]. (BAeu) [She does not stop talking, 
ters, puts her head around his neck.] puts her head on the boy's neck.] 


In many cases, the most natural, direct, and straightforward correspondent of the 
German verb is the collocation with onomatopoeia, but in other cases, the Basque 
translation option seems more phraseological or expressive than the original. As 
can be seen in table 6, the first example is extracted from an indirect translation. 
Both the German and Spanish texts contain a single verb (trinken, beber, 'to 
drink’), while the translator in the Basque version, with the use of the onomato- 
poeia, describes more exhaustively how the drinking is performed: zanga-zanga 
[in gulps]. A very similar situation can be observed in the second example: The 
equivalent for the German verb essen [to eat| is not just the Basque verb jan, but 
it describes how the fish is being eaten by Jasper: mauka-mauka |voraciously, 
greedily]. 


Tab. 6: Examples of more phraseological or expressive options in the target texts 


Source Text Intermediary Text Target Text 


1. Er schimpft, reißt den Kühl- Dani abre la nevera de golpe y Danik hozkailua kolpetik ireki 


schrank auf, schnappt sich bebe directamente de la eta ur minerala zanga-zanga 
das Mineralwasser und trinkt botella de agua mineral. edaten du botilatik bertatik. 
direkt aus der Flasche. (HLJde) (HLJes) [Dani opens the fridge (HLJeu) [Dani opens the fridge 
[He scolds, tears open the at once and drinks directly at once and drinks in gulps the 
fridge, grabs the mineral water from the bottle of mineral wa- mineral water directly from the 
and drinks directly from the ter.] bottle.] 

bottle.] 


24 Massanek, Joachim (2003): Juli, die Viererkettte. Frankfurt am Main: Baumhaus. 
25 Massanek, Joachim (2010): Juli, defensa onena. Bilbao: Gero (translator: Nuria Sebrango). 
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Source Text 


2. Jasper fanden wir bei Mama 
in der Küche. Er af$ den ges- 
tern verschmähten Tiefkühl- 


fisch. (DAKde)*° [We found Jas- 


per in the kitchen with mum. 
He ate the yesterday spurned 
frozen fish.] 


Intermediary Text 


Target Text 


Jasper, amarekin sukaldean 
aurkitu genuen, bezperan nahi 
izan ez zuen arraina mauka- 
mauka jaten. (DAKeu)?’ [We 
found Jasper in the kitchen 
with mum, eating voraciously 
the fish he did not want yes- 


terday.] 


The following translation options in the Basque target texts extracted from indi- 
rect translations could be described as more phraseological if the original Ger- 
man text and the Basque translation were compared, but it is obvious from look- 
ing at the intermediary versions that the Basque translation has been influenced 
by the Spanish version. In the first example, the sun rests [ruhen] in the German 
text, while in the Spanish and Basque versions it sparkles [centellear, diz-diz 
egin|. In the second example, the path leads [führen] somewhere in the German 
original text, and in the Spanish and Basque versions it meanders or zigzags [ser- 
pentear, sigi-sagan igaro]. 


Tab. 7: Examples of the influence of the intermediary texts on the Basque translations 


Source Text Intermediary Text Target Text 


Cuando el valle se vela en 
torno mio con un encaje de 
vapores; cuando el sol de 
mediodía centellea sobre la 
impenetrable sombra de mi 
bosque sin conseguir otra 
cosa que filtrar entre las ho- 
jas algunos rayos hasta el 
fondo del santuario; 


1. Wenn das liebe Tal um mich 
dampft, und die hohe Sonne an 
der Oberfläche der undurch- 
dringlichen Finsternis meines 
Waldes ruht, und nur einzelne 
Strahlen sich in das innere Hei- 
ligtum stehlen [...]. (DLWde)?® 
[When the dear valley steams 
around me, and the high sun 
rests on the surface of the im- 


Harana nire gainean lur- 
rinezko mihiztaduraz bar- 
randatzen denean; eguerdiko 
eguzkiak nire basoaren ger- 
izpe sarkaitzaren gainean, 
hostoen artean eta san- 
tutegiko ostalderaino izpi 
batzu iraiztea baino lortu ez 
dela, diz-diz egiten duenean; 


26 Nöstlinger, Christine (1982): Das Austauschkind. Weinheim: Beltz & Gelberg. 

27 Nöstlinger, Christine (1991): Ingeles bat etxean. Lizarra: Elkar (translator: Xabier Mendigu- 
ren). 

28 Goethe, Johann Wolfgang von (1774): Die Leiden des jungen Werther. Leipzig: Weygand. 
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Source Text 


penetrable darkness of my for- 
est, and only single rays steal 
into the inner sanctum. ] 


Intermediary Text 


(DLWes)? [When the valley 
veils around me with a lace 
of vapors; when the midday 


sun sparkles on the impene- 


trable shadow of my forest, 
achieving nothing but filter- 
ing some rays through the 
leaves to the bottom of the 
sanctuary.] 


Target Text 


(DLWeu)?? [When the valley 
lurks over me with a lace of 
steam; when the midday sun 
sparkles on the impenetrable 
shadow of my forest, achiev- 
ing nothing but filtering some 
rays through the leaves and to 
the sanctuary's heaven.] 


2. Dieser Weg, der direkte Weg 
nach Napoule, führte an den 
Ausläufern des Tanneron ent- 
lang durch die Flufisenken von 


Frayere und Siagne. (DPGMde)?! 
[This path, the direct path to Na- 
poule, led along the foothills of 


the Tanneron through the river 
valleys of the Frayere and the 
Siagne.] 


Este camino, el camino 
directo a Napoule, 
serpenteaba por las 
estribaciones del Tanneron, 
cruzando las cuencas de 
Frayére y Siagne. 
(DPGMes)? [This path, the 
direct path to Napoule, me- 
andered through the foot- 
hills of the Tanneron, 
through the basins of the 
Frayere and the Siagne.] 


5.2 Collocation-Collocation 


Bide hura, La Napoulera 
zuzenean zihoan bidea, sigi- 
sagan igarotzen zen Tan- 
neronen oinetik, Frayere eta 
Siagne ibaien arroetan bar- 
rena. (DPGMeu)?? [This path, 
the direct path to Napoule, 
zigzagged through the foot- 
hills of the Tanneron, through 
the basins of the Frayere and 
Siagne rivers.] 


A special case of this translation option is represented by the example in table 8. 
The Basque children's literature text has been translated indirectly through the 
Spanish text. However, in this case, the collocation is found in the German text 
(laut klopfen, ‘to knock loudly’) and not in the Spanish text, where a single verb 
has been used (dar, ‘to knock’). Thus, given the indirect character of the Basque 
translation, it may be argued that, with the use of the word combination kaska- 
kaska jo [to tap] as an equivalent of the Spanish verb dar, a more phraseological 
target text is created. 


29 Goethe, Johann Wolfgang von (1835): Las desventuras del joven Werther. Barcelona: Apolo 
(translator: José -Mor de Fuentes). 

30 Goethe, Johann Wolfgang von (1987): Werther. Donostia: Kriselu (translator: Gotzon Lobera). 
31 Süskind, Patrick (1985): Das Parfum. Zürich: Diogenes. 

32 Siiskind, Patrick (1985): El perfume. Barcelona: Circulo de lectores (translator: Pilar Giralt). 
33 Süskind, Patrick (2007): Perfumea. Irun: Alberdania (translator: Miren Arratibel). 
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Tab. 8: Example of the translation option Collocation-Collocation in an indirect translation 


Source Text Intermediary Text Target Text 


denn ein Gast klopft laut mit porque un parroquiano daba bezero bat, ordaindu nahi zue- 

dem Messer ans Glas und con el cuchillo en la copa como narena eginez, aiztoarekin ba- 

will zahlen. (EDde)** [be- el que quiere pagar. (EDes) soa jotzen ari bait zen kaska- 

cause a guest knocks loudly [because a customer knocks kaska. (EDeu) [because a cus- 

with the knife on his glass with the knife on his glass as if tomer, as if he wanted to pay, 

and wants to pay.] he wants to pay.] was tapping his glass with the 
knife.] 


5.3 Idiom-Collocation 


In the example cited in table 9, which represents an indirect translation, the id- 
iom is also found in the German source text (wie Kraut und Rüben liegen, ‘to be 
higgledy-piggledy’), and not in the intermediary text from which the Basque 
translation originates. However, since the meaning changes in the Spanish ver- 
sion [to curl up] with respect to the original version, the Basque (which matches 
the meaning of the Spanish text) and German versions also differ semantically. 


Tab. 9: Example of the translation option /diom-Collocation with a difference in meaning due to 
indirectness 


Source Text Intermediary Text Target Text 


Die Jungen warfen sich zu Bo- Y los chicos se tiraron al suelo Mutikoak etzan eta kuzkur- 
den und lagen wie Kraut und y se quedaron muy kuzkur eginda geratu ziren. 
Rüben durcheinander. (EDde) acurrucaditos. (EDes)” [And (EDeu)?é [The boys lay down 
[The boys threw themselves to the boys threw themselvesto and they were very curled 
the ground and lay higgledy- the ground and they were very up.] 

piggledy.] curled up.] 


In the next and final example, however, idioms can be observed both in the orig- 
inal and in the intermediary text: mit vollen Backen (verzehren) (‘to eat with 


34 Kastner, Erich (1929): Emil und die Detektive. Zurich: Atrium. 

35 Kastner, Erich (1967): Emilio y los detectives. Barcelona: Juventud (translator: José Fernan- 
dez). 

36 Kastner, Erich (1991): Emilio eta detektibeak. Donostia: Elkar (translator: Tomás Sarasola). 
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stuffed cheeks’) and (devorar) a dos carrillos (‘gobble food down’). The Basque 
translator used a collocation formed by onomatopoeia and a verb (mauka-mauka 
jan, ‘to eat voraciously, greedily’) to represent the same meaning. 


Tab. 10: Example of the translation option /diom-Collocation with no difference in meaning 


Source Text Intermediary Text Target Text 


[...] und, wenn sie das ge- [...] cuando logran atraparel [...] gura izandako jatena 
wünschte endlich erhaschen, manjar apetecido lo devorana harrapatzen lor dezatenean 
es mit vollen Backen verzeh- dos carrillos y gritan: *;Más!" mauka-mauka jan eta 


ren und rufen: “mehr!” (DLWes) [and when they fi- *gehiago" deiadar egiten 

(DLWde) [and when they fi- nally catch the desired deli- dutenak direla. (DLWeu) [when 

nally catch what they desire, cacy, they gobble it down and they finally catch the desired 

eat it with stuffed cheeks and shout: “more!”.] delicacy, they are the ones who 

shout: “more!”.] eat it voraciously and shout 
“more”.] 


6 Conclusions 


This paper has focused on specific phraseological units found in the Basque lan- 
guage. Basque has a weak tradition of written literature and a short history of its 
standard variety, which coexists diglossically with other major languages and is 
still in the process of standardization. Given these features, it is understandable 
that there has been no systematic research in the field of Basque phraseology and 
that even less attention has been paid to the study of the translation of phraseo- 
logical units from/into Basque. However, as presented in the introduction of this 
paper, there are some projects worth mentioning (Aierbe 2008; Sanz-Villar 2015; 
Iñurrieta et al. 2016), and the objective of the present article has been to make a 
further contribution toward research in this field. 

Altzibar et al. (2011) identified collocations formed by partially or totally re- 
duplicated onomatopoeia and a verb as a special type of formulaic pattern in 
Basque, and the actual use of those collocations has been analyzed in this paper 
from a translational perspective. By doing so, we have encountered both theoret- 
ical and methodological challenges since there was no similar previous study 
that would have served as a reference. From a theoretical perspective, it has not 
always been easy to identify the onomatopoeia part of the collocation as actual 
onomatopoeia or to establish the boundaries between collocations and free word 
combinations. In this sense, the onomatopoeia dictionary of Ibarretxe-Antunano 
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and Martinez Lizarduikoa (2006) as well as Ibarretxe-Antunano’s papers (2006, 
2012) on this topic were of great help and served as a constant reference. 

Methodologically, the first challenge was to create a corpus from scratch that 
would meet the requirements to conduct German-into-Basque translation anal- 
yses of literary texts. Although at present there are a number of very diverse 
Basque corpora available to any user, the type of corpus we needed - a digitized, 
parallel, and multilingual corpus consisting of German original texts, intermedi- 
ary versions in the case of indirect translations, and Basque target texts — not only 
needed to be built from scratch but also required creation of a specific tool,? 
which allowed for the simultaneous alignment of several texts. Due to the lack of 
precedent, another methodological challenge involved the extraction of the phra- 
seological units under analysis. This was solved thanks to a lemmatization and 
POS-tagging tool developed by members of the IXA group that works, among oth- 
ers, with Basque, and an open-source tool that requires some knowledge of com- 
putational linguistics, but can be used with any language. The author is aware of 
the fact that the corpus probably contains more collocations formed by onomat- 
opoeia that were not extracted using this method, because, for instance, the verb 
does not always appear right after the onomatopoeia. However, a great number 
of them were extracted, and this may be seen as a first attempt toward such an 
exhaustive analysis. 

The translation analysis has shown that, despite the predominance of the 
translation option No PU-Collocation, the nuances that are hidden behind it are 
of great significance from a translational point of view. Sometimes the use of the 
collocations seemed to be the most natural way of rendering the content of the 
source text in the target text, but at other times, as exemplified in table 6, the use 
of the Basque collocations resulted in a more phraseological text. When analyz- 
ing somatic phraseological units (Sanz-Villar 2018), it was concluded that in the 
case of indirect translations there is *a tendency to deviate from the Spanish in- 
termediary version and create more typical Basque texts." The examples in tables 
6 and 8, as well as others that could not be presented in this paper, corroborate 
this hypothesis, but more examples should be analyzed. One thing is clear: dur- 
ing the whole process, from the creation of the catalog to the translation analysis, 
it was crucial to take into account that indirect translations are a reality in Ger- 
man-into-Basque translations. 


37 I hereby thank the computer technician Iñaki Albisua for his dedication to the alignment tool 
over the past years. 
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Part Il: Languages Spoken outside Europe 


Andreas Buerki 
(How) is Formulaic Language Universal? 
Insights from Korean, German and English 


Abstract: Items of formulaic language, also referred to as phraseological units or 
common turns of phrase, are in evidence in a very large number of languages. 
However, the extent to which languages feature such formulaic material is un- 
clear. Similarly, how formulaicity may be understood across typologically differ- 
ent languages and whether indeed there is a concept of formulaic language that 
applies across languages, are questions which have not generally been dis- 
cussed. Using a novel data set consisting of topically matched corpora in three 
typologically different languages (Korean, German and English), this study pro- 
poses an empirically founded universal concept for formulaic language and dis- 
cusses what the shape of this concept implies for the theoretical understanding 
of formulaic language going forward. In particular, it is argued that the nexus of 
the concept of formulaic language cannot be fixed at any particular structural 
level (such as the phrase or the level of polylexicality) and incorporates elements 
specified at varying levels of schematicity. This means that a cross-linguistic con- 
cept of formulaic language fits in well with a constructionist view of linguistic 
structure. 


1 Introduction 


In this chapter, I set out to assess whether formulaic language (FL) can be re- 
garded as universal in a comprehensive sense, and if so, what such a universal 
concept of FL looks like. To make this assessment possible, data from Korean, 
German and English are used — between them, these languages cover the spec- 
trum of morphological typology, which is arguably the most pertinent typological 
classification when it comes to FL. 

One way of characterising FL is to say that it represents habitual turns of 
phrase in a speech community (cf. Burger et al. 1982: 1; Coulmas 1979; Erman and 
Warren 2000; Fillmore et al. 1988; Howarth 1998: 25; Langacker 2008: 84; Pawley 
2001). Such typical ways of putting things may include conversational formulae 
(e.g. Thank you very much - not at all), collocations (like face a challenge, or utter 
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disgrace), multi-word terms (open letter, contempt of court) as well as other habit- 
ual sequences (half an hour, no chance of X, behind closed doors) and, to the ex- 
tent to which they are in recurrent use within a community, idioms (like get one's 
knickers in a twist) and even proverbs (garbage in, garbage out). 

FL is held to be of central importance to the functioning of language in a 
number of key ways. For example, besides making up a sizable portion of lan- 
guage in use (Altenberg 1998; Butler 2005: 223), knowledge of FL is thought a 
prerequisite for full proficiency in a language, register, dialect or sociolect. This 
is because habitual turns of phrase are crucially only a subset of all expressions 
that might be judged grammatical (e.g. Bally 1909: 73; Pawley and Syder 1983: 
191; O'Keeffe et al. 2007: 60) and so knowledge of the boundaries of grammatical- 
ity alone is insufficient. FL is also thought to ease processing load during lan- 
guage production and thus it is nothing less than a key enabler of fluency in lan- 
guage (Nattinger and DeCarrico 1992, Pawley and Syder 1983, Wray and Perkins 
2000). Further, research suggests that FL is key to successful mutual understand- 
ing in communication because items of FL activate a range of social, situational 
and cultural contextual cues (Erman 2007: 26; Feilke 1994, 2003: 213, Wray 2008: 
20-21). Hence even in lingua franca communication among L2 speakers, commu- 
nities move fast to establish a stock of FL to aid mutual understanding, as shown 
by Seidlhofer (2009). In short, much in language depends on FL. 

It is likely that items of FL are found in languages universally (Colson 2008: 
191). Previous phraseological research has established the existence of FL phe- 
nomena in very many different languages, including all major European lan- 
guages and less widely spoken European languages and dialects (cf. overview in 
Burger et al. 2007: part XIV and the survey of 74 European and 17 non-European 
languages in Piirainen 2012) as well as Arabic (Abdou 2011), Catalan (Bladas 
2012), Chinese (Shei and Hsieh 2012), Hebrew (Al-Haj et al. 2014), Hindi (Shama 
2017), Japanese (Namba 2010), Korean (Kim et al. 2001), English as a Lingua 
Franca (Kecskes 2007; Seidlhofer 2009) and indeed artificial languages like Espe- 
ranto, Interlingua and Ido (cf. Fiedler 2007), to name only a few of the more re- 
cently investigated varieties (See also major comparative works including the re- 
cent Idstróm and Piirainen 2012; Benigni et al. 2015 and the large number of 
monolingual and multilingual phrasebooks and idiom dictionaries, e.g. anon. 
2010; Cownie 2001).! Consequently, there is every reason to expect that languages 


1 Arguably even programming languages feature items of FL that represent habitual ways of 
coding tasks in a programming language (cf. programming idioms, e.g. in Maruch and Maruch 
2011: ch. 21). 
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that have not yet had their phraseology documented will nevertheless be shown 
to feature FL. 

Crucially, however, the points made above regarding the importance of FL to 
the functioning of language in general require that FL is not only found in all 
languages but found in comparable measure in all languages (universality in the 
comprehensive sense): it would be difficult to maintain that some languages fea- 
ture a greater density of habitual ways of expression than others (all else being 
equal) or that fluency and mutual understanding is better or more easily achieved 
in some languages than others by virtue of their higher rate of FL occurrence. To 
date, no quantitative cross-linguistic studies have confirmed whether FL is in- 
deed found in similar measure across different languages or whether the degree 
of reliance on FL in fact varies between languages and language varieties, though 
results of some studies appear to point to non-universality of FL in the compre- 
hensive sense (e.g. Kim 2009). 

This is a matter of very considerable consequence for the study of FL: if it 
were found that different languages rely on FL to very differing degrees, widely- 
accepted theoretical claims about the importance and role of FL (such as those 
outlined above) would require a fundamental re-examination — there would be a 
strong possibility that FL may in fact be a mere epiphenomenon, a language-spe- 
cific reflex of a more general, yet to be formulated principle that manifests itself 
differently in different languages, rather than a phenomenon of theoretical inter- 
est in itself. If, on the other hand, a coherent concept of FL can be formulated that 
is equally valid across typologically diverse languages, it would reaffirm the sig- 
nificance of FL in linguistic theory and contribute substantially to an understand- 
ing of FL that is able to sustain the continued expansion of phraseological re- 
search into new domains and its application to new data. 

In the following, I will first outline some of the main ways in which FL has 
been understood. Then previous research relevant to the question of comprehen- 
sive universality will be reviewed, along with the relevant concepts of linguistic 
typology. The data and procedure section subsequently outlines how a trilingual, 
topic-matched corpus of around 80 million words of Korean, German and English 
was put together and how it was used to test the universality of the concept of FL. 
In the final two sections, results of this analysis are presented and their signifi- 
cance discussed. 
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2 Background 


2.1 Formulaic Language 


There is a range of current understandings of and approaches to FL and phrase- 
ological phenomena. This complicates any statements made about FL in general 
because it begs the question to which understanding of FL those generalisations 
apply. But the plurality of understandings is also a sign of the multi-faceted na- 
ture of the phenomenon at hand which invites a diversity of approaches and con- 
ceptualisations and it is an index of the vitality of research into FL which attracts 
scholars from diverse fields, and with diverse interests and research agendas. 

At the risk of a degree of oversimplification, it is nevertheless possible to 
identify main strands of thinking on FL which I will do by reviewing three main 
approaches: the traditional phraseological, the psycholinguistic and the corpus 
linguistic. Traditional phraseology considers the criterion-triplet of polylexicality 
(i.e. items involving more than one word), idiomaticity (semantic and/or syntac- 
tic irregularity) and fixedness (or stability) of key importance in conceptualising 
FL (cf. Burger et al. 1982). While the criteria of polylexicality and fixedness are 
common to most concepts of FL, the prominence of the criterion of idiomaticity 
has meant that idioms and proverbial expressions, although shown to be com- 
paratively infrequent in language use (Moon 1995: V, 1998: 81; Colson 2007), have 
tended to be a particular (and occasionally exclusive) focal point within this 
strand of thinking. In the second strand, here dubbed psycholinguistic, the as- 
pect of mental processing features particularly prominently. Sinclair described 
relevant entities as “phrases that constitute single choices” (1991: 110), and the 
idea of prefabrication is prominent in this strand, as for example in Wray’s defi- 
nition of a formulaic sequence as 


a sequence, continuous or discontinuous, of words or other elements, which is, or appears 
to be, prefabricated: that is, stored and retrieved whole from memory at the time of use. 
(Wray 2002: 9) 


Since processing occurs in individuals’ heads, formulaicity in this view might be 
understood primarily as a feature of idiolect rather than of the shared language 
system. The final line of thinking focuses on the aspect of conventionality in re- 
lation to speech communities, as manifested in language use. This can be 
summed up in the characterisation of FL as expressions that represent habitual 
ways of putting things in a community. Early formulations referred to “combina- 
tions sanctioned by usage" (Bally 1909: 73, my translation), while more recent 
work in this line of thinking has described FL as conventional or institutionalised 
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phrases (Pawley 2001: 122; Bybee 2010: 35, respectively; cf. also Howarth 1998: 
25; Brunner and Steyer 2007: 2). In specifically corpuslinguistic work, conven- 
tionality is typically measured via variously modulated measures of frequency of 
occurrence, as in the pioneering study by Altenberg and Eeg-Olofsson (1990) that 
made clear that idiomatic sequences are vastly outnumbered by conventional, 
non-idiomatic sequences that should nevertheless be considered instances of FL. 
While conceptions of FL and their associated terminologies are therefore di- 
verse, they also coincide in key characteristics, such as their tendency to involve 
units larger than words that display stability of form across instances of use. 
Views diverge on the importance of idiomaticity and on whether the mental pro- 
cessing of individuals or the shared conventions of a language community are 
the most relevant aspects of FL. Although the approach followed in this study is 
an inclusive, corpus-linguistic approach of the third strand, the commonalities 
between strands ensure that conclusions drawn are relevant to FL in general.’ 


2.2 Universality and Typological Difference 


Above it was pointed out that previous research has established the existence of 
items of FL in a diverse range of languages. It was also argued that if the well- 
established theoretical claims about FL are to be maintained, the mere existence 
of tokens of FL in the languages of the world provides insufficient support for the 
universality of the full concept of FL. Only evidence of comprehensive universal- 
ity (i.e. of comparable levels of recourse to items of FL across languages) would 
confirm the central importance of FL to the functioning of language in general. 
In the following, therefore, the focus will be on previous research that throws 
light on aspects of this comprehensive type of universality. 

Although generally the concept of FL is most often treated as cross-linguisti- 
cally unproblematic in FL literature, a number of authors have overtly com- 
mented on aspects of comprehensive universality and cross-linguistic concepts of 
FL. Wray (2002), for example, offers comments about the influence of flexible 
word-order on the nature of FL and highlights the fundamental nature in which 
typological differences can affect FL: 


2 Although, due to the rarity of narrowly idiomatic expressions among all items of FL, results 
will be less relevant to a conception of phraseology that is concerned exclusively with items dis- 
playing semantic and/or syntactic irregularity. 
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While an English phrase might be fully fixed except for, say, the verb morphology, its Ger- 
man equivalent might need to contain two slots for the verb, with one or the other being 
filled according to the syntactic environment. 

(Wray 2002: 269; similarly Heid 2012 and others) 


Like most theoretical works, however, Wray otherwise presents findings in terms 
of properties of language in general while drawing primarily on a single lan- 
guage. Although some theoretical treatments are more circumspect when sug- 
gesting generalisations across languages (e.g. Fellbaum 2007: 2), the discussion 
of cross-linguistic aspects is largely left to one side, leading to Colson’s perceptive 
comment that “[o]n the basis of European syntax, we may have a slightly biased 
view of what phraseology looks like in other [i.e. non-European] languages" (Col- 
son 2008: 193). 

Specifically cross-linguistic studies have overwhelmingly focussed on 
strongly idiomatic items of FL (for overviews and discussion of contrastive phra- 
seology see esp. Colson 2008; also Burger et al. 2007: part XIII; Földes 1997) and 
have uncovered findings particularly relating to the figurative semantics of idi- 
oms and their possible implications for universal tendencies in human cognition 
(e.g. Dobrovol’skij and Piirainen 2005) and how widely similar idioms are shared 
between languages (e.g. Piirainen 2012). Other studies discussing cross-linguistic 
aspects (e.g. Butler 1997, 2005; Cortes 2008; Granger 2014) have presented in- 
sightful comparisons of form and function in items of FL, typically across pairs of 
languages. Though none of these studies directly address the question of com- 
prehensive universality or propose adjustments to the concept of FL based on 
their comparisons, Granger observes in relation to lexical bundles in French and 
English that “the overall number of n-grams may differ across languages” (2014: 
61) and that 


a lexical bundle approach [to FL] is likely to generate more interesting results if the lan- 
guages compared are sufficiently close morphologically, lexically and syntactically. 
(Granger 2014: 61) 


However, Granger views this and similar issues caused by “typological differ- 
ences between languages” (60) as methodological problems to which solutions 
need to be found rather than matters of theoretical importance. Similarly, Kim 
(2009) in a comparison of Korean lexical bundles of three-word length in conver- 
sation and academic texts, finds that “in Korean, [...] lexical bundles are gener- 
ally rare overall due to the wide range and variety of word endings" and “[t]he 
findings of the current study [...] suggest that typological differences are obvi- 
ously central to any explanation of these differences" (2009: 157). The fundamen- 
tal questions this raises regarding the nature of FL are not discussed. 
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On the other hand, Durrant’s (2013) study of formulaicity in Turkish offers 
important insights regarding the nature of FL in relation to language typology: 
based on extensive corpus evidence, he demonstrates formulaicity at the mor- 
pheme sequence level and suggests that in agglutinating languages, this may 
pick up the shortfall in the number of recurring multi-word items of FL. Durrant 
maintains that 


[slince individual word forms are rare, so too are high-frequency word combinations. [...] it 
may be that collocation is better described as relationships between lemmas, or between 
specifiable subsets of a lemma, or even between suffix combinations, abstracted from lexi- 
cal roots. 

(Durrant 2013: 34) 


Similar insight may be gleaned from treatments of FL in languages that do not 
mark word boundaries orthographically, such as Chinese. In Chinese orthogra- 
phy, characters represent single “syllables associated with a morpheme" (Sun 
2006: 102) and are not grouped orthographically into words. Since morphemes 
are furthermore *more indeterminate with respect to their bound [or] free status" 
(Sun 2006: 46) the word *is neither a particularly intuitive concept nor easily de- 
fined" (Sun 2006: 46-49), creating immediate problems for the FL-criterion of 
polylexicality. Hence Shei and Hsieh, when describing items of FL in Chinese 
place the locus of formulaicity at the morphological level: they point out that 
"there are traditionally a huge number of four-morpheme units called cheng2yu3 
([...] *established language", *idiom") [...] used to show erudition or simply for 
succinct meaning making" (2012: 327), but that the "issue of large habitually 
formed morpheme groups [...] is notso wellinvestigated to date" (2012: 328). They 
then proceed to outline a “method which can separate idiomatic expression from 
ad hoc polysyllabic [i.e. polymorphemic] strings” (2012: 328), operating, again, at 
the morpheme level. 

In summary, discussions of cross-linguistic aspects of FL, where they have 
occurred at all, have rarely engaged with the question of comprehensive univer- 
sality or the concept of FL that might underlie it. The studies that have compared 
semantic, functional and structural aspects of items of FL have not, in general, 
commented on the effects of differing morphological behaviour among lan- 
guages on the concept of FL or on quantitative aspects, leading to the apparent 
assumption that existing understandings of FL are unproblematically universal. 
Kim (2009), Granger (2014) and Durrant (2013) have shown, however, that this 
cannot be assumed and that models of FL as recurrent strings of word-forms, for 
example, are unlikely to be universal in the comprehensive sense. Consequently, 
the question of how recurrent complex units should be conceived of, across very 
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different languages, has so far not been investigated at anything approaching the 
depth which would be necessary to support the ambitious research programme 
that is currently pursued in the area of FL, or indeed to safeguard the theoretical 
importance currently attached to the concept of FL. The next section lays out how 
the question of whether, and if so in what way, FL is comprehensively universal 
was assessed in this study. 


3 Data and Procedure 


How does one work out whether and how FL is universal in the comprehensive 
sense? The approach taken in this study is a quantitative, corpus-linguistic one 
involving three basic steps: in a first step, a novel genre, topic, structure and size- 
matched trilingual corpus of languages representative of the breadth of diversity 
found across morphological typology was compiled. Next, comprehensive auto- 
matic extractions of items of FL from each of the languages represented in the 
corpus were carried out, employing various candidate universal concepts of FL 
and measuring their effects. In the third and final step, the concept of FL that 
succeeded in yielding a closely comparable number of extracted items of FL a- 
cross the three language sections of the corpus (thus simulating a comprehen- 
sively universal concept of FL) was assessed in terms of whether it is a theoreti- 
cally viable concept of FL or one that does not form a plausible basis for the shape 
of a universal concept of FL. In the former case, the relevant simulated universal 
FL concept would furnish the basis for an explanation of how FL is universal; the 
latter case would suggest that FL is not universal in the comprehensive sense. 


3.1 Corpus Compilation 


In compiling a corpus for present purposes, a range of features needed to be con- 
sidered to obtain valid results. The most fundamental of these was the choice of 
languages compared. Known factors likely to influence FL-density, including 
genre, topic and corpus size, also needed controlling across the different langua- 
ge sub-corpora. 

The languages chosen for the comparison were Korean, German and English. 
Languages can be classified in various ways according to a multitude of features. 
Some of the more common linguistic typologies have classified languages ac- 
cording to word order, vocabulary or morphological type. While all of these cri- 
teria will influence FL to some extent, in this study, morphological classification 
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was used as the basis for source data selection as this type of variation is clearly 
pertinent to FL (cf. below as well as Durrant 2013; Granger 2014). 

The discussion of morphological typology in this section essentially follows 
Whaley (1997: 127-148). Morphological typology can be understood as a classifi- 
cation of the morphological behaviour of a language on two semi-independent 
continua. One is the continuum of synthesis (or morphemes per word ratio), with 
isolating languages (few morphemes per word) at one extreme and synthetic ones 
(many morphemes per word) at the other. The other continuum is that of fusion, 
with agglutinating languages (where individual morphemes remain recognisable 
as they are combined) at one end and fusional or (in)flectional languages where 
morphemes typically merge with one another, at the other extreme. Languages 
are placed at different points on the continua according to their tendencies which 
are, however, not necessarily uniform (Song 2001: 43). Korean, German and Eng- 
lish take up different positions on the continua and therefore represent the 
breadth of diversity found across morphological typology: English is the most 
isolating language of the three, whereas German is more synthetic and also more 
fusional. Korean is yet more synthetic, though unlike German it is agglutinating 
(Sohn 2001: chapter 8). The situation is roughly sketched in figure 1. 


continuum of synthesis 
(morphemes per word) 


isolating -— [ET ssssssasesuuses ce) —— PITT aa — synthetic 


continuum of fusion 
(meanings per morpheme) 


Fictional  sexxsesss @ — i — AEN RRA 49 — agglutinating 


Fig. 1: Continua of synthesis and fusion. Note: E-English, G=German, K=Korean; placements 
are approximate 


It is well known that genre and register influence the types of FL found, but cru- 
cially here, genres are also known to differ in the degree to which they rely on FL 
(Adel and Erman 2012: 81; Biber 2006, 2009; Biber and Barbieri 2007; Biber et al. 
2003; Kuiper 2009; Lenk and Stein 2011; Stein 2007). Topic may well have similar 
effects and it was therefore decided to control for topic as well as genre. To ex- 
clude possible effects of both, while avoiding the complications of translated 
texts, the sub-corpora for each language were drawn from Wikipedia articles, 
with 75% of articles in each language being on shared topics and 25% of articles 
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on topics not covered by the respective other languages (see table 1 for an over- 
view).? A 100% match of topics would not have been feasible as article topics in 
some languages correspond to sections of more general articles in others and vice 
versa. It was also thought important to capture a proportion of language used to 
discuss indigenous topics, as it were, because shared topics would inevitably be 
more globalised in nature. Since several shorter texts may not be equivalent to a 
single text of the same total size in important respects, corpus structure in terms 
of number of documents was also matched across language sub-corpora. Where 
necessary, articles had random paragraphs removed in order to match sub-cor- 
pora in both overall size and in the number of documents included. 


Tab. 1: Corpus composition. Note: SID = syllable information density; shared docs = docu- 
ments with shared topics across languages 


totaldocs shared docs syll. count word count SID 
Korean 63,075 40,545 67,164,785 25,021,576 1 
German 63,075 40,349 55,840,652 28,636,204 1.203 


English 63,075 40,501 48,004,421 29,077,310 1.4 


Perhaps the most obvious factor to be controlled was sub-corpus size. Tradition- 
ally in corpus linguistics, size is measured in number of words. However, words 
are not cross-typologically stable units. As laid out in the above discussion on 
morphological typology, isolating languages tend to split morphemes into many 
words while synthetic languages pack many morphemes into single words result- 
ing in situations where whole phrases in isolating languages like English are 
equivalent to single words in highly synthetic languages like Korean with obvi- 
ous implications for measurements of corpus size.* A measure of corpus size in- 
dependent of the concept of *word' was therefore required and a measure based 
on syllabic information density (SID) was chosen instead. SID (Pellegrino et al. 


3 Translation across Wikipedia pages in various languages does occur, but "articles in the dif- 
ferent versions are often written directly in the respective target-language" (Mc Donough Dol- 
maya 2015: 16). Warncke-Wang et al. (2012) found that of the 1,253,523 articles of the German 
Wikipedia, only 0.30696 were as translations, and only 0.26796 of the English Language Wikipe- 
dia. In any case, however, due to article creation and editing being collaborative and continuous, 
even articles with translation activity at a certain stage in their history are not likely to be trans- 
lated texts in any conventional sense. 

4 The concept of a word is problematic from a theoretical point of view, both within and even 
more so across languages (cf. Dixon and Aikhenvald 2002). 
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2011; Oh et al. 2013) measures the amount of information packed into a syllable 
and then allows for corpus size to be specified on the basis of a density-adjusted 
number of syllables, rather than words, leading to a balanced amount of lan- 
guage across sub-corpora. 

To determine equivalent sub-corpus sizes based on SID, densities were first 
obtained for each language. This was done on the basis of a set of 825 sentences 
of Korean, German and English that were translation equivalents of each other 
and of mixed translation direction. The sentences were obtained from the Ta- 
toeba database of sentence translations (Ho 2009). Information density was then 
calculated as the ratio of the total number of syllables found in the Korean sen- 
tences (baseline) to the number of syllables of German and English respectively. 
This resulted in the quotients given in the final column of table 1. These indicate 
that Korean has the lowest SID, followed by German and then English, which 
packs the most information into a single syllable. These figures were cross-vali- 
dated against those obtained by Oh using different data (Oh et al. 2014; Oh, per- 
sonal communication) and proved closely similar. Densities were then used to 
calculate the target number of syllables needed for each sub-corpus by dividing 
the baseline (Korean) syllable count by the SID for each of the other languages. 
The resulting figures are again shown in table 1. As the word counts of table 1 
indicate, although Korean features the lowest SID (therefore requiring the highest 
number of syllables), Korean words contain the most syllables on average and so 
when measured in words, the Korean sub-corpus is the smallest, followed by the 
German and then the English language sub-corpus. The amount of language 
compared, however, is equivalent. 

In terms of the actual process of corpus construction, the full Wikipedia 
dumps for all articles in Korean, German and English (as per February 2013) were 
downloaded, divided into one document per article and then cleaned and 
stripped of Wikipedia’s XML and non-textual information using WikiExtractor 
(Attardi and Fuschetto 2012). The relevant documents as per table 1 were then 
compiled into a trilingual corpus, observing the target syllable and document 
counts as outlined above. Random paragraphs of some documents were left out 
in order to achieve the target syllable count within the necessary number of doc- 
uments. To facilitate the subsequent analyses, a morphological annotation layer 
was added. For German and English, TreeTagger (Schmid 1994) was used to add 
part-of-speech, lemmas and morphological parsing; HanNanum (Park 2011) was 
used to add the same to the Korean sub-corpus, additionally annotating mor- 
pheme boundaries. 
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3.2 Identification of FL 


This section describes the procedure employed in the identification of items of FL 
in corpus data and the options available within the procedure to simulate various 
underlying FL-concepts. Above, items of FL were characterised as expressions 
representing habitual ways of putting things in a speech community. The idea of 
conventional ways of putting things implies that there are both units of meaning 
(i.e. things to be ‘put’), and linguistic forms conventionally associated with those 
meanings (i.e. ways of putting them). For the purposes of automatic identification 
and extraction, therefore, the operationalization in (1) was used: 


(1) Frequent sequences of linguistic elements forming a semantic unit 


Linguistic elements were taken to be word forms in the first instance (more specif- 
ically, white space delimited orthographic words) with the option to also consider 
lemmas (i.e. words abstracted away from features like case marking), morphemes 
(i.e. sub-lexical units of meaning) and combinations of these. Sequences of 2 to 9 
elements in length were considered. Following the corpus-linguistic strand of 
thinking on FL, conventionalisation was measured via frequency of occurrence 
in corpus material; frequent was taken as minimally occurring twice per million 
words. A semantic unit was deemed a word sequence possessing the sort of se- 
mantic unity typical of words and structurally complete phrases. Semantic unity 
was also attributed to sequences that, while lacking this unity, can acquire it 
through the addition of a single, semantically or formally restricted variable ele- 
ment at either edge of the sequence (such as when in search of does not form a 
full semantic unit unless a variable element on the right is added, i.e. in search of 
X where X is restricted semantically to something prized that is being pursued). 
For reasons of practicality, the phenomenon of sequence-internal variable slots 
(such as at the [young/early/ average/ premature] age of X) was not specifically ca- 
tered for as only continuous sequences of elements were extracted. There is no 
indication that this decision affected the three tested languages unequally, and 
the most frequent fillers of variable slots will be extracted in-situ as an additional 
sequence type (i.e. at the age of X, at the early age of X and at the young age of X 
as separate types). For a more detailed discussion of internal variability, see 
Buerki (2016). 
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Step 1: Word sequence extraction w/frequency filter 


Step 2: Consolidation of different lengths 
Step 3: Lexico-structural filtering 


Fig. 2: Main steps of the identification procedure 


The actual identification of items of FL from each sub-corpus was conducted in 
three steps (cf. figure 2). The extraction from each sub-corpus of all word se- 
quences occurring at least twice per million words (step 1) was carried out using 
the N-Gram Processor (Buerki 2013). To aid accuracy, sequences across sentence 
and sentence-equivalent boundaries were blocked and an additive stop list was 
used. The stop list contained the 200 most frequent word forms of the respective 
language according to the Leipzig Corpus Portal (anon. 2001) and served to elim- 
inate exclusively sequences that are made up entirely of stop-listed (i.e. very 
high-frequency) words.’ In step 2, the various lengths of identified sequences had 
their frequencies consolidated and were combined into a single list using Sub- 
String (Buerki 2017). At step 3, lexico-structural filters were applied to the lists of 
sequences to remove sequences that were likely to lack semantic unity. One entry 
of the lexico-structural filter for English, for example, bars sequences ending in 
the word ‘and’ as most such sequences would fail to show semantic unity. A de- 
tailed discussion of the extraction procedure (applied to a different data set) is 
found in Buerki (2012). 

Extraction accuracy was established as follows. A random sample (n = 300 
types) of automatically identified sequences in each language was rated for com- 
pliance with the operationalisation in (1) by the author and independently by an 
L1 speaker of the respective language acting as a research assistant. Extraction 
accuracy at the baseline (i.e. using sequences of orthographic word forms exclu- 
sively) varied between languages and raters in the range of 72% to 75% of se- 
quence types rated as operationalisation compliant. Recall (the comprehensive- 
ness of an extraction) is difficult to assess in this scenario, but is typically 
inversely related to accuracy, that is, higher accuracy leads to lower recall and 
vice-versa (Manning and Schiitze 1999: chapter 5). The accuracy figures achieved 


5 For German, a stop list based on the top 150 (rather than 200) most frequent words proved 
sufficient to yield comparable extraction accuracy to the other languages. 
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were therefore regarded as suited to present purposes. Notably, the achievement 
of a narrow range of variation in extraction accuracy between the three languages 
was critical because it means that comparability between extractions across lan- 
guages was successfully maintained. A higher accuracy for one language, for ex- 
ample, would almost certainly have caused a lower number of sequences to be 
extracted for that language, thus introducing a bias. Thus a robust identification 
procedure was applied to enable the subsequent quantitative comparison of FL 
across the three languages studied. 


3.3 Simulation of FL Concepts 


As noted above, the first (baseline) FL-concept tested for universality employed 
orthographic word form sequences as the basic building blocks. This represents 
a traditional FL-concept in that it accepts the multi-word level as the relevant 
level at which formulaicity is manifested and it is also very conservative in terms 
of fixedness - it takes the view that all elements of a habitual turn of phrase are 
fully fixed such that, for example, the sequences in (2) are deemed separate types 
of sequences, each needing to satisfy FL-status on its own, rather than being to- 
kens of one sequence. 


(2 consists of X 
consisting of X 
consisted of X 
consist of X 


Two exceptions to full fixedness applied even at the baseline level (in addition to 
allowing variable slots at either edge): all numbers (whether in figures or words) 
were replaced by the label NUM, and occurrences of the names for months of the 
year were replaced by the label NMONTH. This allowed the identification of se- 
quences like those in (3) as a single type. 


(3 NUM days later (two/ten/21 days later) 
in NMOUNTH of that year (in April/July/ August of that year) 
in the early NUMth century (in the early twentieth] 17^ century) 


Although adequate for many cases, previous studies have shown that as a general 
requirement, (almost) complete fixedness is not realistic as items of FL are subject 
to a substantial amount of variation (Wray 2002: chapter 14; Sinclair 2004: 161; 
Langlotz 2006; Dutton 2009). An exception here is the idea of lexical bundles 
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(Biber et al. 1999; Biber and Conrad 1999) which uniquely requires complete fix- 
edness. Since there is no definition of lexical bundles independent of their oper- 
ationalization (resulting in a conflation of theory and method) it remains unclear 
whether full fixedness is of theoretical importance to the idea of lexical bundles 
or simply a methodological expediency. 

As reported in the next section and expected on the basis of previous research 
(Granger 2014; Durrant 2013; Kim 2009), the baseline concept of FL failed to pro- 
duce comparable FL-densities in the three languages. Consequently, progressive 
changes were made to the FL-concept tested until approximate parity in FL-den- 
sities across the three languages was reached. In this iterative process, modifica- 
tions to the FL-concept were progressively stepped up through aspects of fixed- 
ness to more fundamental alterations concerning the level at which formulaicity 
applies. While at each stage, modifications to simulated FL-concepts were made 
incrementally and with a view to maintaining plausibility as far as possible, it is 
important to recall that the goal was to take the simulation to whatever level nec- 
essary to produce approximate parity in FL-density across languages, and subse- 
quently to assess whether the resulting comprehensively universal FL-concept is 
a plausible one or not. Thus it was never in doubt whether parity could be 
achieved (this is a relatively simple exercise), but rather what modifications 
would be necessary, to what extent alterations would be needed and whether the 
resulting concept was plausible. The results of this process are detailed in the 
next section. 


4 Results 


As a baseline for comparisons, the results of a FL-concept of (almost) complete 
fixedness and taking the orthographic (white space separated) word sequence as 
the level at which formulaicity is manifested, are presented in figure 3 and table 
2. Several key observations result: first, the number of items of FL identified a- 
cross the languages is vastly different (both in terms of types as well as tokens) 
and therefore the underlying concept of FL is clearly not universal in the compre- 
hensive sense. It is evident, therefore, that an understanding of FL similar to the 
baseline concept used here has to be regarded as a language-specific phenome- 
non in that density of occurrence varies greatly between languages. Perhaps the 
most prominent such concept is the idea of lexical bundles, which is even more 
fixed and depends to a much greater extent on (ultra-high) frequency of occur- 
rence as a defining characteristic than the baseline concept used here. A second 
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immediate observation is that the number of items identified as FL in each lan- 
guage parallels the placement of the respective language on the continuum of 
synthesis (cf. figure 1). This confirms the dependence of the baseline concept of 
FL on typology - something distinctly undesirable for a concept of importance to 
language in general rather than certain languages only. 


Tab. 2: Items of FL under the baseline FL concept 


FL-types FL-tokens 

Korean 10,617 1,480,862 
German 19,114 2,677,999 
English 25,712 3,727,071 
a types tokens 
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Korean German English Korean German English 


Fig. 3: Items of FL under the baseline FL 


By contrast, the figures obtained by employing a simulated universal concept of 
FL are presented in figure 4 and table 3. These figures show that it is entirely pos- 
sible to automatically identify a comparable number of items as formulaic in each 
of the languages. The question to consider is whether the underlying concept of 
FL is a plausible, coherent and sensible concept within the context of what is 
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known about FL. To make this assessment, the changes to identification param- 
eters implemented to move from the baseline concept of FL to the simulated uni- 
versal concept are set out below, and subsequently assessed. 


Tab. 3: Items of FL under the universal FL concept 


FL-types FL-tokens 

Korean 24,345 3,619,171 

German 26,577 3,807,337 

English 25,712 3,727,071 
tokens 


10000 15000 20000 25000 
1000000 2000000 3000000 


5000 


types 


Korean German English Korean German English 


Fig. 4: Items of FL under the universal FL concept 


4.1 Adjustments 


The adjustments indicated below were implemented by adapting a version of the 
source corpus and then re-running the FL identification procedure with commen- 
surate adjustments to stop lists and filters where necessary. 
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4.1.1 Fixedness 


The first set of adjustments was made to the degree of fixedness: as pointed out 
above, phraseological research has long maintained that many items of FL re- 
quire certain types of flexibility. Such flexible items could be said to be under- 
specified to a degree, or specified at a more schematic level than the word form 
sequence — and they require adjustments to fit contexts of use. One type of flexi- 
bility is the occurrence of variable slots as seen above. Others are alternations in 
word order and inflectional morphology. The effects of morphological typology 
seen in the results reported above suggest that inflectional morphology is perti- 
nent to the differences observed in the data and so the first set of adjustments to 
the concept of FL was made to reduce fixedness in areas of inflectional morphol- 
Ogy. 

In Korean, this flexibility was simulated by removing case markers of subject 
(- 0// AF [i/ga]) object (-&/ £ [eul/reul]) and topic (- £/ = [eun/neun]) as realised 
by the bound morphemes indicated, as well as all plural markers (- S/deul]/).° No- 
tably, the absence of these markers does not necessarily result in ungrammati- 
cality as they are "frequently omittable" (Sohn 2001: 231). Korean also possesses 
an elaborate system of verbal (and in some cases adjectival) inflection to mark 
politeness levels (Sohn 2001: 231-241), though other aspects, such as grammati- 
cal person, are not marked morphologically. The formal style used in texts like 
Wikipedia articles, however, means that only a very narrow range of these inflec- 
tions is manifest, rendering intervention superfluous. To exemplify effects of ad- 
justments made, items in (4) can be seen united under a single sequence type (5) 
as a consequence of the adjustments. 


(44) HA STSm [beoseu jeongriujangeul] bus stop-OBJ 
A Gree [beoseu jeongriujangeun] bus stop-TOPIC 
A A me [beoseu jeongriujangi] bus stop-SUBJ 
A WEBS [beoseu jeongriujang] bus stop 

(5 HA gre [beoseu jeongriujang] bus stop 


Morphology to mark tense/aspect, mode and modality was left unadjusted - as 
in other languages (including German and English), these are expressed partly 


6 There is some disagreement over whether these markers are more suffix-like (as assumed 
here) or more word-like (cf. Sohn 2001: 231). As current orthography does not typically afford 
these markers the status of orthographic word, they are taken as bound morphology here. 
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by inflectional morphology and partly periphrastically. The relaxation of fixed- 
ness would therefore not have contributed to addressing typological differences 
between the languages compared. 

Methodologically, the adjustments mentioned were implemented by produc- 
ing a version of the source corpus that had all items deleted that were marked by 
the morphological parser as instances of the Korean subject, object, topic and 
plural markers. The FL-identification procedure was then re-run to produce a new 
list of items of FL. 

In the German sub-corpus, an equivalent reduction in fixedness was targeted 
by masking all verbal inflections for grammatical person (but tense/aspect, mode 
and modality was again retained as this is marked in all the languages under in- 
vestigation and would therefore not target differences).’ Further, all case and 
gender inflection was masked on definite and indefinite articles, adjectives and 
nouns (but number distinctions were retained as they occur in the English sub- 
corpus as well and were deemed an overly harsh generalisation for these lan- 
guages). Again, to illustrate the effect of some of these adjustments, sequences in 
(6) appear united under (7) after the adjustments. 


(6) die Bundesrepublik Deutschland the Federal Republic of Germany [nominative] 
der Bundesrepublik Deutschland the Federal Republic of Germany [dative/genitive] 
(7 ARTDEF Bundesrepublik Deutschland 


Similarly, after adjustments the nine attested sequence types in (8) appear under 
(9) as four generalized types. 


(8) zur Verfügung stehen be available [15'/3'¢ pers. pl, pres. tense] 
zur Verfügung steht be available [34 pers. sg/2™ pers. pl, pres. tense] 
zur Verfügung stehe be available [3 pers. sg, subjunctive I] 
zur Verfügung stünden be available [14/3 pers. pl, subjunctive II] 
zur Verfügung stand be available [15/34 pers. sg, past tense] 
zur Verfügung standen be available [154/37 pers. pl, past tense] 
zur Verfügung stehende available [adjectival, case/number marked] 
zur Verfügung stehenden available [adjectival, case/number marked] 
zur Verfügung stehender available [adjectival, case/number marked] 


(9) zur Verfügung stehen IndPres (indicative, present tense] 
zur Verfügung stehen IndPast ^ [indicative, past tense] 
zur Verfügung stehen Subj [subjunctive] 
zur Verfügung stehend [adjectival] 


7 This was done by replacing finite verbs with lemmas marked for tense and mode. 
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As shown, distinctions in tense and mode are retained, but flexibility is intro- 
duced with regard to grammatical person (for verbs) and case/number marking 
on adjectival expressions. Notably, not all available forms of the respective in- 
flectional paradigms are attested in the corpus (zur Verfiigung stehen [be availa- 
ble] does not occur in the data with inflection for first or second person singular, 
for example) and some forms occur only a few times. This is partly due to the 
particularities of the corpus, of course, but is also a manifestation of a degree of 
fixedness of the expression. Therefore, even when extensive flexibility in terms 
of inflection is introduced, this does not necessarily lead to the identification of 
as many more items of FL on the basis of heightened recurrence as might be ex- 
pected. The examples also show that the range of inflectional morphemes is fur- 
ther limited by the fusion of morphemes - the last three forms in (8) represent all 
possible combinations of case and number marking. 

Methodologically, these adjustments were again achieved by modifying a 
copy of the source corpus in which all German verb forms, adjectival forms, forms 
of the definite and indefinite article and noun forms were replaced with the re- 
spective lemma (plus the added information on mode, tense, number, etc. that 
was to be retained) as seen in (9). The FL-identification was then re-run. 

In English, an equivalent level of flexibility is inherent due to the absence of 
some of the equivalent inflectional morphology on the one hand and the isolating 
morphology on the other. The effect of the latter is seen in (6), where English 
would require the addition of the free morpheme of for genitive case marking in 
the second line, but this would still leave the recurring 5-element sequence the 
Federal Republic of Germany intact (and easily identifiable) in both lines of (6). 

Although further flexibility could have been introduced to the simulated FL 
concept, this was not deemed judicious because the adjustments introduced al- 
ready cover the aspects of flexibility that are pertinent to the typological differ- 
ences in morphology present in the data set: there would have been little gain, 
for example, in such sweeping adjustments as a complete generalisation over 
tense marking because morphological tense marking is not a feature on which 
the three languages differ categorically.? Despite little further room for sensible 
reductions in fixedness, checks at this stage of the simulation indicated that a 


8 While the focus of this study is on morphological differences, it is likely that a simulated gen- 
eralisation over aspects of word order would reduce differences in this regard between Korean 
and German as languages with more word order variation on the one hand and English with less 
word order variation on the other (although, of course, there is some word-order variation in 
English as well; cf. Heid 2012). It has to be left to future studies to ascertain the magnitude of the 
impact of these differences. 
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comprehensively universal concept of FL was not yet achieved. The simulation 
was therefore stepped up to comprise another area to which previous research 
has drawn attention: the level(s) at which formulaicity operates. 


4.1.2 Levels of Focus 


The final step to the universal concept of FL that produced the figures of table 3 
required adjustments to the levels at which constituent elements of FL are recog- 
nised, to include certain units at the morpheme level. In Korean, 14 common 
bound morphemes occurring word-finally (the translation equivalents of which 
are generally free morphemes in German and English), were separated from their 
hosts so that they became eligible for recognition as independent constituents of 
formulaic sequences. The morphemes concerned are: -/ ([ui] of); - 9/47 ([eseo] 
from); -o/([e] at); -</° = ([ro/euro] towards); -2// 2/([gwa/wa] and); -5/2 ([hago] 
and); -Z ([go] and); -9/7/ ([ege] to); -= ([do] too), -=£ ([buteo] from); -7/4/ 
({kkaji] until); -£/ ([man] only); -Z/Z/ ([mada] every); -A/(lji] not). This was imple- 
mented by identifying all instances of the named morphemes in a copy of the 
source corpus and isolating them from their hosts through the insertion of white 
space characters. Identification occurred with the help of part-of-speech tags 
supplied by the morphological parser, as many of the forms involved, being sin- 
gle or double syllables, also occur as constituents of other lexical items, or as 
homographs). 

In German, compounds consisting of common words were separated so their 
constituents become eligible for recognition as independent constituents. Alt- 
hough both Korean and English feature compounds as well, German is particu- 
larly noted for its use of compounding and the length of its compounds (cf. Russ 
1994: 221-225), making this an important area where formulaicity remains unrec- 
ognized in one language but picked up in others due only to differences in mor- 
phological typology. In addition, German compounds are typically single ortho- 
graphic words where many English and some Korean compounds consist of 
multiple orthographic words (hyphens were treated as separate words in all lan- 
guages, resulting in hyphenated compounds like open-minded being treated as 
3-element expressions). In example (7) above, the cross-linguistic effect of Ger- 
man compounds is drawn into focus as the German expression consists of three 
elements, whereas the English gloss consists of five. After compound-separation, 
(7) appeared as in (10) featuring four elements.? 


9 Deutschland might also be split but was left whole by the splitting software (s. below). 
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(10) ARTDEF Bundes republik Deutschland 


jWordSplitter (Naber 2015) was used to divide German compounds in a copy of 
the source text and the FL-identification procedure was repeated. jWordSplitter 
divides noun compounds and some verbal and adjectival compounds. Due to 
necessarily limited coverage of the morphological dictionaries used, jWordSplit- 
ter is in practice most effective splitting compounds that consist of common word 
forms, giving it a fairly light touch that turns out to be well suited for the level of 
adjustments required in the present case. 

As before, no equivalent adjustments to the English language data was nec- 
essary as the adjusted Korean bound morphemes are already independent in Eng- 
lish (as well as German) and the adjustment to German compounds now approx- 
imated the state of compounds in English (and Korean). A sample of identified 
items of FL, including items identified only after adjustments (marked with an 
asterisk), is shown in table 4. 


Tab. 4: Sample of identified FL. Notes: * items resulting from an adjusted FL-concept; X = varia- 
ble slot; NUM = numbers; ARTDEF = definite article 


items of FL gloss 
Korean 0] = LE [eoneu jeongdo] roughly, to some extent 
910] =* [ieongeo ro] in English 
AZ &* [jigeum do] even now 
X alt @}74|* [X gwa hamkke] together with X 
Ar SS] ASFA CH [baksa received [their] PhD 
hakwi[reul] chwideukhaeotta] 
sa 9$! 9|* [yureop yeonhap ui] of the European Union 
X = EE [X hu gotbaro] right after X 
x IH OFC* [X ttae mada] always when X 
“1 CHS 0|* [geu daeum e] after that 
x Of Chet 27l CH [X e ttara dallajinda] differ depending on X 
x ME ACHE YAH QC [X inneun it has become known that there is X 
geoseuro allyeojyeo itta] 
All = HTH [chello hyeobjugok] cello concerto 
X (9)& 9?loLOj* [(eu)ro inhayeo] because of X 
X 9| Z0|* [X wa gachi] with X 
Q 24 =! [orae doen] old (lit. long been) 
X Of] CHEF A] 21* [X e daehan jiwon] support for X 


NUM & 9| L}O| £z* [NUM sal ui nai ro] at the age of NUM 


German aus diesem Grund for this reason 
zu diesem Zeit punkt* at this point in time 


English 
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items of FL 


ARTDEF Stadt zentrum* 

in diesem Sinne 

anhand von X 

Tage buch* 

auch sonst 

immer größer 

dazu fiihren, dass X* 

siehe unten 

miteinander verbunden 

wie zum Beispiel X 

DEFART so genannte/r/n X* 

bis zu seinem Tod NUM 

hinzu kommen/kommt, dass X* 
nach dem Krieg 

bereits im NUMten Jahrhundert 
in Frage gestellt 

stehen/steht unter Denkmal schutz* 


gloss 


the city centre 

in this way/sense 

by means of X 

diary (day book) 

at any rate / anyway 

bigger and bigger 

lead to the outcome that X 

see below 

connected to each other 

as for example X 

the so-called X 

until his death in NUM 

added to this, X 

after the war 

going back to the NUMth century 
questioned 

be listed (i.e. be a listed building) 


X was released in NUM 
until his death in NUM 
large amounts of X 
natural resources 
mainland China 
Member of Parliament 
internal combustion 
consistent with X 

open to the public 

the Olympic Games 

on several occasions 
science and technology 
in the US state of X 

by the early NUMs 
special effects 

it is thought that X 
incompatible with X 

on the grounds of X 

in a NUM — NUM victory over X 
due to the fact that X 
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4.2 Assessing the Simulated Concept of FL 


Having reviewed the underpinnings of the simulated universal concept of FL that 
underlies the figures presented in table 4, we can now outline its main features: 
with regard to fixedness, the universal concept of FL allows flexibility minimally 
in the areas of inflectional morphology to do with case marking, marking of 
agreement and if necessary marking of number to allow items of FL that specify 
these aspects at a more schematic level than the word form. Additionally, the uni- 
versal concept of FL used is flexible with regard to the locus of formulaicity and 
recognizes formulaicity at the morpheme level.'? 

It is important to note that in this context flexibility does not mean that in all 
cases items of FL must be pitched at the most schematic level: the results dis- 
cussed suggest that many items may be pitched at that level, but others are not 
and there will be different elements of the same item at differing levels of sche- 
maticity. For example, the German item of FL eines Tages (some day, at some 
point in time; lit. of a day) is fixed in the genitive case, but the more schematic 
ARTINDEF Tag" (a day) is still a common turn of phrase forming a semantic unit 
regardless of case marking. Similarly, the Korean phrase WS ZZ [iereul 
deulmeon] (for example, lit. if [we] take an example) invariably specifies the object 
case marker - & [-reul], including in all 1,214 occurrences of the expression in the 
corpus, despite case marking in general often being omitted, as discussed above). 
Similar examples are mentioned by Granger (2014: 60) (see also Tognini-Bonelli 
(2001) for a defense of the word form as relevant unit). In this sense, the identifi- 
cation of items of FL carried out above was a simulation of a flexible FL concept; 
an actual identification based on a flexible FL concept would identify items at 


10 It may be argued that instead of the flexibility claimed to be necessary, it may be sufficient 
(or at least partially sufficient) simply to adjust the minimum frequency level for less isolating 
languages as part of the identification procedure (as Granger 2014 suggests), or that, in effect, 
the need for flexibility is created artificially by using frequency as part of the operationalization 
of FL. But this argument would be problematic: frequencies would have to be lowered very sub- 
stantially from an already low threshold to get a similar effect because unlike in certain other 
procedures, frequency is only one element of the operationalization of FL used. A substantial 
lowering of threshold frequency would result in a much lower accuracy of identification (unless 
replacement filtering devices are used), meaning that the additional items would be unlikely to 
be bona fide items of FL in the sense used in this study. More fundamentally, frequency bears 
theoretical significance as it is used to operationalize conventionality and so is a fundamental, 
rather than accidental, aspect of FL according to the understanding of FL put forward. Conse- 
quently, adjustments to take account of this are justifiable. 

11 ARTINDEF is the label used for a lemma of the indefinite article. 
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their most relevant level(s) of schematicity, which might well differ for each con- 
stituent element. 

It can now be considered whether the universal concept of FL described is 
plausible. There are two main considerations that strongly suggest that this uni- 
versal concept of FL, besides succeeding empirically, also forms a coherent and 
sensible concept from the point of view of theory. First, the features of increased 
flexibility in levels of schematicity and locus of formulaicity are not novel fea- 
tures, but have been suggested, albeit more tentatively, by previous studies as 
outlined above. This analysis has principally added an indication of their scope 
and necessity. Second, specification at various and mixed levels of schematicity 
(with some elements highly fixed in all aspects and others much less so) and the 
loss of the significance of the distinction between the word and morpheme levels 
are features that are not unusual: if we turn to constructionist approaches to 
grammar (also known as Construction Grammar and noted for their tight inter- 
facing with phraseological theory and data, cf. Van Lancker Sidtis 2015; Buerki 
2016), these features are not only accommodated but predicted as features for lin- 
guistic structures across language (Hilpert 2014; Hoffmann and Trousdale 2013). 
In constructionist theory, all of language consists of constructions that are spec- 
ified at the full range of levels of schematicity from fully substantive (lexically 
fixed) to fully schematic. For example, the fully schematic ditransitive construc- 
tion in (11) is as much a bona-fide construction as the partially lexically substan- 
tive construction in (12) or the fully substantive construction in (13). 


(11) «Subj V Obj1 Obj2» (e.g. I handed her the book) 
(12) «the Xer the Yer? (e.g. the bigger the better) 
(13) «blue jeans» 


Further, in constructionist theory, constructions exist from the level of single 
morpheme or morpheme group to that of phrase without a theoretically signifi- 
cant distinction between word and morpheme level constructions (cf. table 1.1 in 
Goldberg 2006: 5). Consequently, constructions like <prebook> or «over-V» (as in 
overeat, oversleep, etc.) are as much constructions as «blue jeans» or phrase-level 
constructions (11) and (12). From a constructionist viewpoint it therefore comes 
as no surprise that a universal concept of FL should admit items that are specified 
at various and mixed levels of schematicity, such as specification of the exact 
word form for all elements as in (14), specification at word form level for all but 
one element in (15) where the second element is specified at a more schematic 
level that allows case marking flexibility, specification at a fairly abstract level as 
in (16), which only contains two fully substantive elements, or indeed (17) which 
is formulaic at the morpheme sequence level. 
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(14) eines Tages (one day) 

(15) BAP SE) 4| & oL NL} [baksa hakwi(reul) chuieukhaeotta] (received [their] PhD) 
(16) X [was/is to be/will be/is due to be] released in NUM 

(17) Tagebuch (diary, lit. book of days) 


The comprehensively universal concept of FL outlined above therefore not only 
succeeds in demonstrating its comprehensive universality across the three lan- 
guages in our data, but also presents itself as a plausible concept of FL, taking 
previous phraseological research and insights from constructionist theory into 
account. 


5 Discussion and Outlook 


The results of this study fall into three general areas of significance. The first con- 
cerns the concept of FL and in what sense it is applicable universally to different 
types of languages. Here results show that it is possible to construct a concept of 
FL that applies in equal measure to isolating languages such as English with a 
low morpheme-per-word ratio, languages like German that feature a vast array of 
case, gender and agreement morphology, as well as polysynthetic, agglutinating 
languages like Korean, where individual words are often equivalents of whole 
phrases in more isolating languages. This is significant, because although the ex- 
istence of FL is documented in a wide range of languages, previously FL was not 
subject to large-scale cross-linguistic comparison of quantitative aspects and 
such comparisons as have been conducted have yielded stark cross-linguistic dif- 
ferences in the number of items of FL identified. This posed fundamental chal- 
lenges to the adequacy of the theoretical claims outlined above, most principally 
to the central importance of FL to the functioning of language in general, but 
these claims have now been safeguarded by the presentation of a plausible uni- 
versal concept of FL. 

Second, results crucially also reveal that this cross-linguistically viable con- 
cept of FL must incorporate two key aspects that have hitherto not been promi- 
nently discussed or applied: on the one hand, the inclusion within the concept of 
FL of more flexible, more schematic forms that require fine-tuning at time of use 
(as well as fully substantive forms that do not) is a requirement for a plausibly 
universal concept of FL, not an optional or marginal feature. While some items of 
FL are best identified as fully substantive forms that allow their use in context 
without any further adjustments, more schematic forms that require morpholog- 
ical fine-tuning must equally be recognised as FL. In the data, this fine-tuning 
typically consists of adjustments for case, number, or person, but may include 
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other aspects. The point here is that without the ability to stipulate schematic 
forms, many individual, fully substantive forms will on their own be too rare to 
be reasonably considered common turns of phrase in their own right and this 
shortfall has vastly more serious consequences for languages that use, for exam- 
ple, case marking than for languages that do not, resulting in vastly different 
amounts of FL being detected between such languages. On the other hand, an 
adjustment of a more radical nature is required if the notion of FL is to be a uni- 
versal one: the traditional fixation on FL as sequences of words needs to be rela- 
tivized and sequences of sub-word-level linguistic items need to be eligible for 
recognition as legitimate items of FL. Again, the data indicate that this is neces- 
sary for the notion of FL become universally applicable. Thus results indicate that 
a universal concept of FL is viable but absolutely requires the admission of se- 
quences that need a degree of fine-tuning at the time of use, and further requires 
a discounting of the importance of the word level that has hitherto been a prom- 
inent feature in conceptualisations of FL. 

Third, results also suggest adjustments to the place of FL in an overall theory 
of language. In terms of theories of linguistic structure (i.e. syntax and morphol- 
ogy), notionally, FL can be integrated into various frameworks (cf. Wray 2008: 
chapter 7) or it may be envisaged as a completely separate module or *subsys- 
tem" (Dobrovol'skij 1992: 279) of the grammar. However, the requirement for a 
universal concept of FL to discount the significance of the word level, and the 
inclusion of sequences at differing levels of schematicity, strongly support and 
integrate with constructivist approaches to grammar. These approaches place 
linguistic constructions (from fully substantive phrases to fully schematic con- 
structions), rather than words and rules of combination, at the centre of theoret- 
ical thinking. Items of FL function in this view as constructions of a particular, 
namely a predominantly substantive, type. Therefore, a universal concept of FL 
suggests a natural integration with constructivist theories of language where FL 
is able to take up an important place, commensurate with its importance in ac- 
counting for how language operates. 

There are of course also a number of limitations to consider: only some, 
though arguably the most pertinent, aspects of how languages vary have been 
considered in this study. Detailed consideration of other aspects, such as the ef- 
fects of freer word orders in some languages, and other features of languages not 
investigated in this study will no doubt add further important detail to a universal 
concept of FL. In its outline however, the concept put forward is unlikely to 
change dramatically. 

Overall, results obtained offer strong evidence for a cross-linguistically ro- 
bust notion of FL and how it fits into a larger theoretical context. This advances 
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the field of research into FL by placing it on a firmer footing and by affirming its 
importance in accounting for how language works. This firmer footing can sus- 
tain current interest in the phenomenon and contribute to stimulating further re- 
search into theoretical as well as applied aspects of FL.” 
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Abdullah Eisa 
Marhaban: Reconsidering the Criteria of an 
Arabic Phraseme 


Abstract: This paper deals with the difficulties that face Arabic phraseology when 
the established criteria of phraseology as defined by Gries (2008) are applied. The 
paper focuses especially on the number of elements involved in a phraseme and 
here we introduce the concept of one-word + zero-element phrasemes in Arabic. 


1 Introduction 


Studies on Arabic phraseology focus on empirical applications of Arabic phra- 
semes. Scholars adopted the definition already established in the research to de- 
fine an Arabic phraseme (Miiller 1993, 2001; Ghariani Baccouche 2007). However, 
an Arabic phraseme challenges the established criteria for a phraseme. In order 
to illustrate this, we base our discussion of Arabic phraseology on the criteria for 
a phraseme as defined by Gries (2008: 6): 

1. Natural elements are lexemes or lemmas (words) 

2. The number of elements is two or more 

3. Frequency of co-occurrence is greater than expected 

4. The distance between elements is usually short (interrupted by just one 
word) or nonexistent 

The flexibility of elements should not exceed more than one element 

6. Aphraseme should function as one semantic unit 


d 


The six parameters criteria *underlie most phraseological work" (Gries 2008: 5) 
and provide a precise definition that would help phraseologists and researchers 
from other fields to identify a phraseme in general and an Arabic phraseme in 
particular. 

Gries suggested that his first criterion (the nature of the elements) included 
not only lexical items, but also grammatical patterns (Gries 2008: 5). He further 
argued that lexical items and lemmas should be accepted as phraseological (Gries 
2008: 5). 

As to the second criterion (the number of elements), a phraseme must be cre- 
ated from two or more elements. The minimum number of elements in the case of 
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Arabic should be the focus of more scholarly attention, since the morphological 
concepts manhut and al-murakkab al-mazji — both of which originally contained 
two or more lexical items - are dealt with as single lexemes in dictionaries. 

With regard to the third criterion (the number of occurrences), Gries claims 
that a phraseme can be identified as such “if its observed frequency of occurrence 
is larger than its expected one” (Gries 2008: 5). Although the strong tendency of 
two items to co-occur has been mentioned in most of the published definitions of 
phrasemes, such a method requires a well-established corpus, which does not 
exist for classical Arabic. 

Regarding the fourth criterion (the permissible distance between the ele- 
ments), Gries adopted a “widespread broader perspective” that allowed word col- 
locations that contained discontinuous items to be identified as phrasemes (Gries 
2008: 5). Arguments in favour of this criterion can be found in papers based on 
N-gram studies of natural language processes (Gries 2008: 5).' However, applying 
this criterion to Arabic would tend to conflict with Arabic’s syntactic nature as a 
free-order language. 

The fifth criterion (the degree of flexibility of the elements) revolves around 
the question of how flexible a phraseme ought to be. What tenses can it contain 
and still be considered a phraseme? What is the level of lexical flexibility for a 
phraseme? Completely inflexible forms, i.e. full-phrasemes are accepted, but the 
criterion also allows “relatively flexible patterns”, such as phrases that allow 
multiple tenses but exclude one particular tense (Gries 2008: 5). Also, the crite- 
rion includes “partially lexical-filled patterns”. 

Lastly, the sixth criterion (semantic unity) is a semantic one, acting as the 
core of the definition of a phraseme: any word combination deemed a phraseme 
should function as one semantic unit (Gries 2008: 6). However, a debate has aris- 
en over whether a phraseme should be semantically non-compositional. Gries ar- 
sued that this was unnecessary, but advocated unity of meaning (Gries 2008: 6). 
The final definition of a phraseme he arrived at, based on the foregoing six crite- 
ria, was as follows: 


[A] phraseologism is defined as the co-occurrence of a form or lemma of [a] lexical item and 
one or more additional linguistic elements of various kinds which function as one semantic 


1 N-grams, bigrams, and trigrams are the extracted results of a study that statistically analyses 
"recurrent continuous sequences of two or more words". Phraseological studies based on N- 
gram analysis have usually advocated the continuity of the items of a phraseme (Granger and 
Paquot 2008: 38-39). 
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unit in a clause or sentence and whose frequency of co-occurrence is larger than expected 
on the basis of chance[.| 
(Gries 2008: 6) 


The six parameters and definitions discussed above provide us with six clear cri- 
teria for the definition of a phraseme. These criteria focus on three main concepts: 
the individual elements, the occurrence of the elements as a single unit, and the 
semantic unity of the phraseme. Although this definition provides a comprehen- 
sive definition of a phraseme within the frame of the European languages, it 
needs to be examined within Classical Arabic, the object of this study. 

In this paper I will investigate the challenges that applying the criteria of a 
phraseme provides, aiming to redefine a phraseme within the context of Arabic. 


2 Investigating the Criteria 


2.1 The Nature of a Phraseme Element 


According to the definition proposed by Gries and adopted in this paper, all ele- 
ments of a phraseme should be words. Words, according to Gries, are “a form or 
lemma of lexical items and any kind of linguistic element” (Gries 2008: 5). The 
term ‘word’, however, requires further discussion. In Arabic tradition, ‘word’ is 
defined as a letter (harf), a noun, or a verb (Ibn ‘Aqil 1980: 14). A noun is thus a 
word with an independent meaning but no tense; a verb is a word with an inde- 
pendent meaning and a tense; and a letter is a word with neither (Ibn ‘Agqil 1980: 
15)? Also, given that pronouns in Arabic are considered to be nouns, as they refer 
to a meaning by themselves and function as nouns grammatically (al-Nili 1999: 
596; Ibn ‘Aqil 1980: 15), a suffix pronoun - e.g., kaf al-khitab [second person sin- 
gular] - is considered an independent element of a phraseme and can, with an- 
other lexeme, form a phraseme (Ibn ‘Aqil 1980: 31). As a result, any word of any 
word-class, whether a noun, pronoun, verb, or harf, can form a phraseme under 
certain conditions as in hananay-ka (your [dual] mercies) = be patient. The sec- 
ond element of the phraseme hananay-ka is the second person pronoun käf al- 


2 In traditional Arabic grammar, conjunctions and determiners are included in the harf word- 
class, while pronouns are included in the noun word-class (Ibn ‘Aqil 1980: 15). 
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khitab [second person singular], which with the first element handnayn forms the 
phraseme.? 

Additionally, the Arabic definition of *word' implies that there is no distinc- 
tion between lexical items and grammatical patterns in terms of fulfilling the re- 
quirements of phraseme elements; i.e., the granularity level of the element can 
be either a lemma or a morphological form (Gries 2008: 15). 


2.2 The Number of Elements 


A phraseme, by definition, comprises a phrase. An English phrase, for instance, 
is defined as “any syntactic unit which includes more than one word and is not 
an entire sentence" (Matthews 1997: 255). Applying this criterion to the Arabic 
language calls for further investigation, however, due to the existence of what I 
will term ‘one + zero elements’ phrasemes. In the following, I will discuss how 
the word marhaban is actually a phrase, in the deep structure, and a phraseme 
made up of one explicit element and a zero element. 

Some Arabic phrasemes are made up of two elements, one explicit and the 
other implicit, i.e., understood from context. The words marhaban |to be in a spa- 
cious place] = to be welcome, and ahlan [to be among one's people] = to be wel- 
come, are two good examples of this phenomenon. Marhaban is a word used for 
greeting, and has the original meaning *wide' (Ibn Manzür 2005: 1472-1473). 
Marhaban is classified as a cognate object, or what is known in Arabic as maf‘ul 
mutlaq. The cognate object is a verbal noun derived from the main verb (Taha 
2011: 1), used after a verb to either describe or emphasize it (Ibn ‘Aqil 1930: 169). 
Given the grammatical class to which marhaban belongs, we can surmise that the 
phrase has a missing element. That element can be defined as a zero element on 
both a syntactic and a semantic level. Syntactically, the accusative case (nasb) 
requires a verb from which the cognate object is derived. Marhaban is therefore 
in the accusative case as it is influenced al-‘amil (the governor) by taqdir. The 
concept of taqdir can be explained as follows: 


The speaker ‘hides’ things in speech, and it is the grammarian's task to reconstruct these 
hidden elements in order to explain the surface structure of the sentences. The most im- 
portant aim of Arabic grammar is the explanation of the case endings (i rab) in the sen- 
tences that are produced by the action ( amil) of a visible element in the sentence. If no such 


3 The -n at the end of hanänayn is dropped in the formation of the merging of the word and the 
pronoun. 
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element is available, the grammarian must have recourse to an underlying structure in 
which these elements are made explicit. 
(Versteegh 2011: 1) 


In the case of marhaban, the implicit element is the ‘amil, which is the verb 
arhaba. It is crucial to mention here that the implicit element can be either a zero 
element or a semantic ellipsis. These two potential explanations are both consid- 
ered below. 

First, with regard to ellipsis, an elliptical phrase is one in which some ele- 
ments are omitted, especially if its meaning is supplied by its context (Matthews 
1997: 111). Linguists distinguish between different kinds of ellipsis." In Arabic, 
there are a number of linguistic phenomena considered to be ellipsis, including 
sluicing, verb-phrase (VP) ellipsis, and noun-phrase (NP) ellipsis (Mughazy 2011: 
2). In sluicing, the omitted element is preceded by a wh-question tool, as in ex- 
ample (1a), where an omission can be understood from the antecedent, thereby 
allowing the phrase to be interpreted as arada ‘Ali an yadhhaba ilà l-bayti in (1b). 
Sluicing therefore contrasts with NP ellipsis, illustrated in example (2), in which 
the missing element is not a phrase but a single noun. The quantifier can be 
tanwin: the suffix n, or the prefix al- (Mughazy 2011: 2).° In example (2), the omit- 
ted noun can be interpreted as al-muwazzafin, as in (2b). Lastly, in VP ellipsis, 
the omitted element is the head verb and its internal object the argument 
(Mughazy 2011: 2). This type of ellipsis only occurs after auxiliary verbs. No ex- 
amples of VP ellipsis have been identified in classical Arabic, other than in ‘amiy- 
yah (colloquial), which is beyond the scope of this research (Mughazy 2011: 2). 


(1a) ‘Ali arada dh-dhahaba ila l-bayti wa la adri limadha. 
‘Ali wanted to go home, and I don't know why.’ 

(1b) [;‘Ali[i{ypast., Sing., 3* person][[varädaı] ovdh-dhahäb;]] cosjwa negla sAlpres., sing., 1° 
person][vpadri]|[, limatha][ s^ [voA]weA]]] 

(2a) al-mudiru qàbala l-muwazzafina illa I-ba‘d/ba‘da-n 
‘The manager met all the employees except for a few.’ 

(2b) [al-mudiru [i[ipast., Sing., 3* person]|vp gäbala] ov l-muwazzafina] excepilla [os I-ba d] 


[pA] T] 


The above examples, although they are not formulaic, refer to the syntactic sub- 
sentential level, and in the case of a one-element phraseme like marhaba-n, none 


4 There is no agreement about the typology of ellipses. Nevertheless, VP ellipsis, NP ellipsis, 
and sluicing, albeit under various names, are widely acknowledged. 

5 Mughazy (2011: 2) does not mention the suffix al- as a quantifier, although it can be used in al- 
ba‘d for the same purpose, as in example (2). 
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of the ellipsis types can be applied. Linguists have proposed two rival explana- 
tions for this phenomenon. Haddar and Ben Hamadou (1998: 271) referred to it as 
“false ellipsis”, which can be understood without constructing the complete 
form. The same authors claimed that false ellipses “can be resolved at the lexical 
level” (Haddar and Ben Hamadou 1998: 271) and gave two examples of it: da-n 
sa'ida-n (Happy New Year), and an-nära n-nära! (Fire, fire!). The elliptical ele- 
ment in the first example is the verb atamanna (I wish), and in the second, the 
verb ihdhar (be careful). Although their examples indicate more than one lexeme, 
both demonstrate the concept of omitted ‘amil (action) - the case with which we 
are specifically concerned. It may thus be claimed that ellipsis can be understood 
on a lexical level, but further investigation into the syntactic level is nevertheless 
required. 

Stainton discussed two potential modes of analysing/explaining the phe- 
nomenon: a pragmatics-oriented approach, and semantic ellipsis (Stainton 2005: 
386). The first requires that an utterance's “face value" be the main focus of anal- 
ysis, while its pragmatics - i.e., gestures and context - treated as the responsibil- 
ity ofthe utterance's receiver, who reconstructs missing elements and fills in gaps 
(Stainton 2005: 387).° Crucially, however, it is not the non-sentential phrases that 
this approach intends to reconstruct; it has no interest in filling in linguistic gaps 
(Stainton 2005: 387). Rather, the non-linguistic context in which the elliptical 
phrase occurs fills the semantic gap in the utterance. 

Analysing the phenomenon of one-word Arabic phrasemes using Stainton’s 
pragmatics-oriented approach therefore leads us to either a) accept or b) reject 
the idea that single words can be phrasemes. The first option, however, must be 
rejected as contradicting the definition of phraseology and its units: for a 
phraseme, by definition, is formed from a phrase, which cannot comprise fewer 
than two elements (in the case of the Arabic language, lexemes and pronouns). 
And in considering the second option, we cannot overlook the fact that one-word 
expressions function as phrasemes in Arabic, and syntactically reflect a missing 
element - the ‘amil (action) - which changes their grammatical case from nomi- 
native to accusative. These cases are marked by case-endings: dammah [suffix u], 


6 Stainton provides two "competing views" of how to explain how the gap is filled. The first 
view, advocated by Barton (2005: 386), *postulates (i) a sub-module of linguistic context, that 
operates exclusively on the sub-sentence uttered plus prior explicit discourse, (ii) a sub-module 
of conversational context, that takes the output of the first sub-module as input, and uses non- 
linguistic context [...] to derive what the speaker meant to convey". The other view, advocated 
by Stainton himself, is that while gap-filling does occur *via non-deductive inference", there are 
no pragmatics modules at work, but rather “central system progresses, inferential processes not 
specific to language, [that are used] to bridge the gap". 
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waw [suffix ül, alif | suffix a] and nün [suffix n] for nominative, and fathah [suffix 
al, ya’ [suffix i], alif [suffix a], and hadhf [deletion] for accusative. 

Semantic ellipsis occurs when a sentence is elliptic, but the ellipsis can be re- 
constructed by applying the syntactic rules of the language in the absence of an 
uttered antecedent. It differs from the previously described varieties of ellipses in 
that it could potentially explain one-word phrasemes. Although marhab-ta, the 
omitted verb of marhaba-n, does not exist (or at any rate has not been detected) 
in lexicons of the Arabic language, it must still be reconstructed — especially in 
combination with the cognate object marhaba-n - to justify the accusative case 
ending (al-Farahidi 2005: 342).’ Stainton defended the pragmatics-oriented ap- 
proach by arguing that the reconstructed phrases may not suit the elliptic phrase, 
and cited the following example. If someone asks ‘Who loves Michael Jackson?’, 
the answer could be ‘Me’. The elliptic part of the phrase ‘Me’ does not suit the 
reconstructed phrase ‘I love Michael Jackson’, since the pronoun in the elliptic 
phrase is in the accusative case, whilst in the reconstructed phrase it is in the 
nominative case. Stainton gives another example in German. A German speaker 
would say ‘mein Vater’ [my [nom.] father], whilst pointing at someone that re- 
minds him of his father; however the reconstructed phrase would be ‘Das erinnert 
mich an meinen Vater’ [that reminds me of my [acc.] father]. However, in the case 
of Stainton's first example, answering the question posed with ‘T or ‘I do’ would 
be more grammatically correct English than answering with ‘Me’, even though 
the latter is generally accepted in colloquial usage; i.e., the elliptic phrase could 
originally have been composed with the pronoun in the nominative case. Alter- 
natively, we can view the reconstructed phrase as being the (likewise grammati- 
cally correct) ‘It is me who likes Michael Jackson’. Similar arguments can be ap- 
plied to Stainton’s German example. It should also be noted that ellipsis can be 
used to simplify an utterance, and that therefore, an elliptic phrase can be under- 
stood when the simplest case is used, even if it does not agree with the origi- 
nal/reconstructed phrase. 

The example of marhaban can be better explained via the concept of a zero 
element in a phraseme, given that a phraseme is a set phrase and a phrase by 
definition is more than one word (McGregor 2003: 77-78, 82).? However, a one + 


7 “When al-Khalil (2003: 105) was asked about the accusative case of marhaban he said “in it a 
hidden verb"; he meant: dwell or stay, so it became accusative by a hidden verb, then it became 
dead when its [the verb’s] meaning became well-known" (al-Farahidi 2005: 342). 

8 McGregor (McGregor 2003: 77-119) provides a detailed account of the historical background 
of the concept of the zero-element. Hel also differentiates between ‘zero’ and ‘nothing’, for zero 
should fulfill two conditions provided by Haas (1962: 49): a) distinctive omission of overt forms, 
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zero element phraseme is not to be confused with a lexeme + pronoun phraseme, 
in which the second element is a suffix pronoun, e.g., hananayka [your [dual] 
mercies| = slowly. Though the zero element is the element that does not exist in 
some linguistic cases of a given language, its visible equivalent does exist in the 
majority of language cases (Haas 1962: 34). However, the zero element has an im- 
pact on its linguistic context (Haas 1962: 34). For example, the suffixes -ed and -t 
are morphemes that indicate the past tense in verbs in English, although such 
morphemes do not exist in verbs like cut and put (Haas 1962: 34). However, these 
verbs' tenses can be understood from context; and the absence of an element sig- 
nifying the tense is the zero-element. In Arabic, the zero-element applies to sukün 
(Bishr 1998: 187; Firth 1957: 180-189):? the case-ending used in the absence of 
any of the three case endings fathah Val, dammah \u\, and kasrah \i\ (Bishr 
1998:187). Jazm is a syntactic case in which the case-ending is a zero-element 
(sukün ©) because it demonstrates an absence of the uttered morphemes. In the 
case of a one-uttered-word phraseme, the second non-pronounced element is a 
zero-element of the phraseme. For instance, the verb marhb-ta, which functions 
as the action of the cognate object marhaban, constitutes the zero-element in the 
phrase that forms the phraseme marhaban. 

In conclusion, applying the second criterion of phraseology to Arabic 
phrasemes creates a difficulty that needs to be overcome insofar as Arabic in- 
cludes one-word phrasemes in which one element is uttered and the other is el- 
liptic. This phenomenon is best classified as semantic ellipsis for two reasons. 
Firstly, the uttered element is a cognate object that needs an governor (‘amil) to 
justify its grammatical case. Therefore, a verb that coheres with it is reconstructed 
(taqdir) — as arhab-ka allahu |[may]| God [have] you in a spacious [place]], in the 
case of marhaban. Secondly, a one word Arabic phraseme, e.g. marhaban, does 
not constitute a syntactic ellipsis, since in such an ellipsis, the uttered element 
requires a reference to an uttered antecedent, but the action/verb has never actu- 
ally been found in classical Arabic in the context of marhaban. Finally, the appli- 
cation of the concept of the zero-element to one-word phrasemes allows them to 
meet the established definition of a phrase. The elliptic element of the phraseme 
marhaban is a zero-element, as it exists only in parallel phenomena, and it has 
an effect - i.e., the formation of a phraseme - on the existing element. The one- 


and b) overt alternates to this operation. If the potential zero element loses one of those condi- 
tions, it becomes ‘nothing’ rather than ‘zero’. 

9 The concept of a zero-element in Arabic was first introduced by Firth (1957: 180-189) then 
further explored by Bishr (1998: 187). 
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word phraseme can thus be defined as a single word that is part of an elliptic 
phrase and therefore functions as a phraseme on its own. 

Another issue that emerges when attempting to apply the second criterion to 
Arabic phraseology is the polylexical phenomenon of naht: two or more words 
that are merged into one, losing some of their letters in order to cohere with the 
structure of the quadrilateral root.? For instance, hawqalah is derived from là 
hawla wa là quwwata illà bi-llah [there is no might nor power except in God]: a 
sentence used in prayer or in response to an unpleasant situation. Such words 
function as phrasemes since they adhere to the other criteria; however, they re- 
quire some further explanation. Semantic ellipses and zero-elements cannot be 
applied to the phenomenon of naht, since there is neither any ellipsis nor are 
there any non-pronounced elements. However, the original words are merged via 
contraction. Hence, manhüt is a phraseme written as one word, but composed of 
fragments of other words that together formed a sentence-long phraseme (al- 
Khatib 2003: 439). 


2.3 The Number of Co-Occurrences Required before a Phrase 
can be Considered a Phraseme 


Counting the instances of co-occurrence of a particular phraseme in classical or 
Modern Standard Arabic would normally require the existence of a corpus of rel- 
evant text. In its absence, classical collections of idioms and proverbs including 
Amthal al-‘Arab by al-Mufaddal al-Dabbi (d. 784), al-Durrah al-Fakhirah fi al- 
Amthäl al-Sä’irah by Hamzah al-Asfahani (d. 961), Majma‘ al-Amthäl by al- 
Maydani (d. 1124), and collections of non-figurative set phrases like Thimar al- 
Qulüb fi al-Mudäf wa al-Mansüb by al-Tha alibi (d. 1038) and Ma Yu'awwal 'alayh 
fi al-Mudáf wa al-Mudäf Ilayh by al-Muhibbi (d. 1699), are key repositories of 
phrasemes for Classical Arabic and for MSA, which contains a large number of 
Classical Arabic phrasemes. Collections of Classical Arabic books such as Is- 
lamport.com or Shamila could be referred to, in order to measure the number of 
occurrences of the phraseme in Classical Arabic works. Also, the International 
Arabic Corpus is useful for MSA. Additionally, collections of eloquent phrases are 
an important source of phrasemes, reflecting prevalent metaphorical phrases. 
Two examples of this type of lexicon will be referred to: the Jawahir al-Alfaz of 
Qudamah Ibn Ja far (d. 949), and the al-Alfaz al-Kitabiyyah of al-Hamathani (d. 


10 The root in Arabic consists of either three letters 1-2-3 (a-k-l) or four letters 1-2-1-2 (w-s-w-s)/1- 
2-3-4 (h-n-z-]). 
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939). However, with Gries (2008: 5) stating only that this number should be 
“larger than [...] expected”, as previously noted in reference to Gries’s work, the 
lack of a well-established corpus of Classical Arabic does not support the corpus- 
based method of identifying a phraseme. 

Accordingly, analyses of the idiomatic level and referring to the previous col- 
lections would be the potential methodology used to identify Arabic phrasemes. 
However, distinguishing idiomatic from literal meaning in Arabic can at times be 
problematic because of the lexemes that are affected by dead metaphors. Two 
conditions will merit the use of dictionaries for the purpose of tracking original 
meaning. First, the source should be written before the target era. Secondly, the 
original meaning, i.e. literal if it occurs in a secondary meaning, of the phra- 
seme’s elements should be indicated. For example, for phrasemes that occur in 
late Andalusi works, e.g. the works of Ibn al-Khatib, the dictionaries that can pos- 
sibly be used for this purpose are the Al- ‘Ayn of al-khalil (d. 736), the Tahthib al- 
Lughah of al-Azhari (d. 981), the Maqayis al-Lugah of Ibn Faris (d. 1004), and the 
Taj al-Lugah wa Sihäh al-‘Arabiyyah of al-Jawhari (d. 1003). Additionally, the 
original source domains of the phrasemes would be traced to their possible 
sources, with the aim of gaining a clear indication of their primary semantic level. 


2.4 The Permissible Distance between the Elements of a 
Phraseme 


A phraseme is a set phrase in which the elements cannot be substituted. These 
elements function as one semantic unit by being attached to each other (Gries 
2008: 6). In a restricted-order language, the order in which an element occurs in 
a phrase is important to the reader's understanding of that word's grammatical 
class. Arabic is a free-order language, meaning that the grammatical class of a 
word is not affected by the order of the elements in the phrase in which it appears 
(al-Sirafi, 2008: 263)." This raises an important question: What are the limits of 
order-change in an Arabic phraseme? To arrive at a definitive answer will require 
thorough analysis. It is reasonable to claim that a set phrase can be considered a 


11 In some cases in Arabic, order is important for the identification of the grammatical class of 
a word: for instance, when a case-ending does not appear because it would render a long vowel 
at the end of a word un-pronounceable. One example of this is daraba Isa Misa [Isa hit Musa]. 
Both Musa and Isa end with long vowels that cannot be pronounced alongside either the case- 
ending of the nominative case /u/, or the case-ending of accusative case /a/. Thus, only word- 
order reveals the meaning of the sentence, based on grammarians' agreement that the subject 
comes before the object. 
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phraseme as long as order-changes do not affect its metaphorical meaning. For 
instance, in the case of the phraseme as-salamu ‘alaykum wa rahmatu l-làhi wa 
barakatuhu (may the peace, mercy, and blessings of God be with you), although 
the literal meaning of the phrase’s element does not express the meaning of greet- 
ing, the phrase is commonly used as a greeting. It can be found in various orders, 
such as salamu I-lahi ‘alayka wa rahmatuhu wa barakatu and ‘layka wa rahmatu 
l-lahi s-salamu. In all three of these versions, despite changes to word-order, the 
phrase retains both the same metaphorical meaning and the same function as a 
greeting. 

The second issue that must be addressed under this criterion is the size of any 
gap between the elements of a phraseme. As discussed earlier, in a broad sense 
and up to a certain point, a gap between the elements of a phraseme can be ac- 
cepted. To identify the specifics of such limits in Arabic, a survey study would be 
required. However, as noted above the metaphorical meaning of a phrase is the 
main criterion for accepting a phrase, regardless of whether it is a phraseme or 
has lost its phraseological identity. For instance, ‘ala gawmihä janat baraqish 
[Baraqish has harmed her people] is a phraseme used to describe anyone who 
hurts their people unwillingly. In the context of an own-goal in a football match, 
for instance, the commentator might say wa baraqishu hunā naraha janat li-l’as- 
afi ‘ala gawmihä [and Baragish, we can see her here, harmed, unfortunately, her 
people]. Although the phraseme has been changed syntactically - with Baragish 
this time not a subject, but an object functioning as an antecedent of the omitted 
pronoun in the verb janat [harmed] - it still reflects its original metaphorical 
meaning. This, of course, works with sentence-long phrasemes but not with one- 
word phrasemes or with lexical idioms (e.g. compounds such as manhüt or tarkib 
mazji). 


2.5 The Lexical and Syntactic Flexibility of Phraseme 
Elements' Non-Substitutability 


The concept of fixedness of an Arabic phraseme can be examined on two main 
linguistic levels: the syntactic and the lexical. Syntactically, phrasemes that 
“break the conventional grammatical rules" (Moon 1998: 21), known as ill-formed 
collocations, are completely fixed. Ill-formed collocations can be idioms, prov- 
erbs or even pragmatic phrasemes. A clear example of an Arabic pragmatic phra- 
seme that is an ill-formed collocation is murghamu-n akhaka là batal [your 
brother is forced (to do what he has done) not a hero]. Under the conventional 
grammatical rules of Arabic, akhäka should be written in the nominative case 
(akhüka) as the subject of a passive-voice sentence, or as it is known in Arabic, 
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naib fa‘il. In this pragmatic phraseme, the two conditions of free phrases are vi- 
olated (Mel'éuk 1998: 30). Pronouns in Arabic are mostly morphemes,” so in 
phrasemes that contain a pronoun, the pronoun changes with context. For in- 
stance, hanànay-ka [your (dual) mercies] is grammatically fixed in the accusative 
case, and its pronoun changes depending on the person(s) to whom it is ad- 
dressed, as follows: 


2MUS-hananay-ka, 2FEM-hananay-ki, 2DUL-hananay-kuma, 2PLUR.FEM-hananay-kum, 
2PLUR-MAS hananay-kunna. 


Thus, the fixedness percentage of an Arabic phraseme can either be complete (in 
the case of ill-formed collocations/pragmatic phrasemes) or semi-flexible; and its 
status as completely fixed or semi-flexible affects whether its pronoun morpheme 
varies with context. 

The lexical flexibility of an Arabic phraseme depends on the number of ele- 
ments it has. Phrasemes with two elements, regardless of whether both are ut- 
tered or one is a zero-element, are completely fixed. Marhaba-n (a lexeme + zero- 
element phraseme) and subhana I-lah [exalted is God] (both elements of which 
are uttered) are both examples of two-element phrasemes that are completely lex- 
ically fixed. However, the lexical flexibility of phrasemes that are formed of more 
than two lexemes is merely restricted, due to the ability of the receiver/audience 
to comprehend the metaphorical meaning intended by the phraseme’s formation. 
Take the phraseme daraba 'usfürayni bi hajari-n wahidi-n |(he) hit two birds with 
one stone]. If a speaker means to refer to finishing two or more tasks by perform- 
ing just one action, he can either use the phraseme as it is, or change the word 
‘usfurayn [two birds] to ‘amalayn [two tasks], yielding daraba ‘amalayni bi hajari- 
n wahidi-n. His audience will comprehend the reference to the metaphorical 
meaning because the semantic metaphorical meaning is still preserved in the re- 
maining elements of the phraseme. Similarly, if the element 'usfürayn remains 
while hajari-n [a stone] is changed to another lexeme, such as tawqi‘ [signature] 
in the context of, say, paperwork, the phrase now being daraba ‘usfurayn bi 
tawqi“ wahidi-n, the intended metaphorical meaning of the phraseme will still be 
obvious to the Arabic audience. In other words, the lexical flexibility of an Arabic 
phraseme is dependent on two conditions: 1) the phraseme must be formed of 
more than two elements, and 2) its metaphorical meaning must remain intact. 


12 Unless the pronoun in the accusative case, it is either separated from the action or placed 
before the action. In these two situations, the pronoun is iyya + (second-person or third-person 
pronoun). Pronouns in the nominative case are treated as separate lexemes. 
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2.6 The Semantic Unity and Unpredictability of a Phraseme 


Fully fixed phrasemes are defined by the third and the fourth cases of the formula 
provided by Mel’éuk (1998: 30-31): 


A [phraseme] AB of a language L is a semantic phraseme of L such that its signified ‘X’ is 
constructed out of the signified of one of its two constituent lexemes—say, of A— and a sig- 
nified ‘C’ [‘X’ = ‘ADC’] such that the lexeme B expresses ‘C’ only contingent on A. 


(Mel'Cuk 1998: 30) 
The third case: 


‘C = ‘P’, i.e. B has (in the dictionary) the corresponding signified; and ‘B’ cannot be ex- 
pressed with A by an otherwise possible synonym of B. 
(Mel'Cuk 1998: 31) 


As in the Arabic phraseme: Baytu I-Mäl ‘the house of money’ (ministry of finance 
in the medieval era). 


The fourth case: 


‘C’ = ‘B’; 'P' includes (an important part of) the signified ‘A’, that is, it is utterly specific, and 
thus B is ‘bound’ by A.’ 
(Mel'Cuk 1998: 31) 


As in the Arabic phraseme: khariru I-mä’i ‘the sound of falling water’. 


Mel’éuk’s (1998: 30) formulae illustrate fully fixed phrasemes in which neither 
element can be substituted, at a semantic level. In the first formula, bayt as an 
individual lexeme means ‘house’, while al-mal means ‘money’. Yet the individual 
meanings of the lexemes do not add up to or predict the overall meaning of their 
phraseme: ‘ministry of finance’. Moreover, substituting a synonym for either of 
these elements will obscure the metaphorical meaning of the original phraseme. 
The same phenomenon can be observed with other figurative metaphors, and to 
a certain extent with non-figurative ones, e.g., khariru I-mä’i [the voice of falling 
water] = a specific term for the sound of water like in a waterfall. 

In the case of khariru I-mä’i, the first element of the phraseme does not co- 
occur with any other lexeme, since the semantic field of the first element is in- 
cluded in the semantic field of the second element. This leads us to deem it a 
‘cranberry collocation’: i.e., one of the elements - kharir, in this instance - is 
unique to that collocation (Moon 1998: 21). Nevertheless, this unique element can 
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be replaced by another synonym that gives a broad sense of the target meaning. 
Kharir is a special term to indicate the sound of falling water, but if a speaker uses 
sawt [sound] in the same context, it will be understood, provided that the hearer 
recollects the meaning of the original substituted element, kharir. 

In short, Arabic phrasemes occur as single semantic units, and their mean- 
ings cannot be predicted from the individual meanings of their elements. In non- 
figurative phrasemes, and in figurative ones (albeit with more difficulty), one of 
the elements can have asynonym substituted for it. However, when this happens, 
the resultant phrase 1) does not act as a phraseme, and 2) requires the audience 
to recall the original element of the phraseme, in order to understand the seman- 
tic unit that the collocation seeks to provide. 


3 Conclusion 


This paper has explored the challenges that emerge when the established criteria 
for phrasemes are applied to the Arabic language. The first criterion is affected by 
the fact that pronouns in Arabic are considered to be one-letter nouns; and the 
second, by the existence of numerous one-word Arabic phrasemes. The theory of 
the zero-element was found useful in overcoming the latter issue, insofar as a 
one-word phraseme can be construed as having two elements, one of which is a 
zero-element lexeme that was important in the formulation of the phraseme, but 
which no longer explicitly exists. 

With regard to the third criterion, a lack of corpora prevents direct counting 
of the co-occurrence of the elements of a given phraseme in Arabic. We will there- 
fore utilise metaphorical fixedness as a key parameter of the phrasemes sampled 
from that literature, supported by comparison with collections of fixed colloca- 
tions in Arabic. In terms of the fourth criterion, the question of the distance be- 
tween the elements of an Arabic phraseme will require further investigation. 
However, this chapter established that Arabic phrasemes exhibit a degree of flex- 
ibility based on the context, as long as the sixth criterion is fulfilled. 

As to the fifth criterion, an Arabic phraseme can have some flexibility as re- 
gards accepting a substitute element, when the phraseme is formed of more than 
one uttered element and its semantic unity remains intact. Finally, Arabic 
phrasemes fit the sixth criterion in the sense that they occur as single semantic 
units. This criterion also supports the fifth one, by demonstrating the possibility 
of substituting one or more of the elements in a phraseme - but only if the audi- 
ence recalls the original element(s). 
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Muhammad A. Badarneh 
Formulaic Expressions of Politeness in 
Jordanian Arabic Social Interactions 


Abstract: This study explores the use of politeness formulaic expressions in eve- 
ryday social interaction in colloquial Jordanian Arabic. Analysis of ethnograph- 
ically observed data of ninety-four formulaic expressions within the framework 
of Brown and Levinson’s (1987) classical politeness theory reveals that these for- 
mulae are of two types: positive politeness formulae which are used in interac- 
tional and transactional contexts and emphasize solidarity and communal be- 
longing; and negative politeness formulae concerned with showing deference 
and non-imposition. The use of these formulae reflects speakers’ greater concern 
with positive rather than negative politeness. It further displays the fixity and 
continuity of social norms and traditions transmitted through these formulae. As 
many of these formulae involve reference to God, such formulaicity further em- 
phasizes the religious and fatalistic nature of the community. 


1 Introduction 


This paper investigates formulaicity in colloquial Jordanian Arabic, specifically 
in the domain of everyday social interaction from the perspective of politeness 
theory. Although everyday spoken colloquial Jordanian Arabic is rich in formu- 
laic expressions, these expressions have not received sufficient attention. This 
study, therefore, seeks to address and redress this gap by considering formulaic 
expressions as used in everyday social interactions in this variety of Arabic. 
Formulaic expressions are exploited as ‘conversational routines’, defined by 
Coulmas (1981: 2) as “highly conventionalized prepatterned expressions whose 
occurrence is tied to more or less standardized communication situations”. The 
formulaic expressions examined in this paper constitute “routine formulae” in 
line with Coulmas’s (1994: 1292) definition, that is, they are ‘fixed’ both in form, 
compared to other expressions in the language which are created anew every 
time, and in function, in that they fulfil specific highly recurrent communicative 
tasks. They are conventionalized linguistic formulae triggered by specific com- 
municative settings where their use is expected and deemed appropriate because 
they are seen as part of the speaker’s communicative competence as well as 
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his/her everyday politeness behavior. As Bardovi-Harlig (2012: 207) explains, for- 
mulaic expressions “often succinctly capture the illocutionary force of a contri- 
bution by virtue of the fact that the speech community in which they are used has 
tacitly agreed on their form, meaning, and use". According to her review of recent 
findings on formulaic language, definitions of formulaic expressions in pragmat- 
ics research usually contain three elements: they are recurrent sequences, they 
occur in specified social contexts, and they are known by members of a speech 
community (Bardovi-Harlig 2012: 207). These criteria will be shown to apply to 
the politeness formulae used in everyday Jordanian Arabic. 

The meanings and uses of some formulaic expressions can be particularly 
appreciated when they are translated into English, showing the sociocultural 
specificity of Arabic as well as the values and beliefs of the speech community. 
While the use of these formulae is not obligatory, failure to use them in the ap- 
propriate context will result in socially negative conversational ‘implicatures’ 
(Grice 1975) and consequently constitute a threat to the interlocutor’s ‘positive 
face’ or ‘negative face’ (Brown and Levinson 1987), which is another reason why 
these conversational formulae deserve attention. 

Formulaic expressions are described as having a stereotyped, routinized, or 
fixed form; conventionalized meanings that include attitudinal and affective con- 
notations; and specialized usage conditions (Hallin and Van Lancker Sidtis 2017: 
69). The use and prevalence of such formulaic expressions in spoken discourse 
has been highlighted by different scholars (e.g. Coulmas 1981; Aijmer 1996; Kecs- 
kés 2003; Wray 2008; Bladas 2012). Formulaic expressions differ from newly cre- 
ated, grammatical utterances in that they are characterized by familiarity and 
predictability, are closely related to communicative-pragmatic context, and are 
widely regarded as crucial in determining the success of social interaction in 
many communicative aspects of daily life (Van Lancker Sidtis and Rallon 2004; 
Van Lancker Sidtis 2010). Native speakers can recognize and complete such for- 
mulaic expressions (when words are omitted) as well as demonstrate knowledge 
of their specialized meanings and appropriate contexts (Van Lancker Sidtis and 
Rallon 2004: 208). Tactical use of speech formulae is even honored in some lan- 
guages, as reported by Tannen and Oztek (1981) regarding Turkish and Greek. 
This value and special status of formulaic expressions in social interaction has 
led to renewed interest in formulaic language as a large and vibrant part of lan- 
guage competence (Coulmas 1994; Kuiper 2004; Pawley 2007; Wray 2008), lead- 
ing to increased interest in formulaicity in diverse discourse contexts, mainly as 
a result of the burgeoning interest in pragmatics, and the embracing of spoken 
text by sociolinguists and discourse analysts. The present study is carried out in 
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the spirit of this interest and in light of the use of spoken discourse as an im- 
portant source for investigating formulaic language and speech. 

While formulaic expressions have been studied in different languages (e.g. 
Dogancay 1990; Takekuro 1999; Overstreet and Yule 2001; Terkourafi 2002; Sa- 
beri 2012; Levin 2014), they have not received as much attention in Arabic, and 
despite the prevalence of formulaic expressions in everyday Arabic social inter- 
actions, only a limited number of studies have investigated their communicative 
and pragmatic functions. A pioneering study in this area is Ferguson’s (1983) 
work on the pragmatic features of what he called ‘God-wishes’ in Syrian Arabic, 
which are formulaic expressions that begin or are assumed to begin with ‘God’, 
and whose semantics is only peripherally related to their actual uses. 

A number of studies have focused on the use of other God-related formulae 
that are originally religious expressions but have undergone pragmatic transfor- 
mation, thus acquiring new discourse pragmatic functions in everyday Arabic 
speech, notably the two expressions that have an iconic status in spoken Arabic: 
insallah (Gregory and Wehbe 1986; Farghal 1995; Clift and Helani 2010) and 
masaallah (Migdadi et al. 2010). These expressions are prototypically illustrative 
of a unique language feature that the Arabic language possesses, namely the so- 
called ‘Allah Lexicon’, which is a rich and varied body of religious expressions 
invoking the Almighty (Morrow 2006). Other researchers have analyzed broader 
and diverse aspects of Arabic formulaic expressions, such as their literary and 
textual sources (Müller 2000; Baccouche 2007), their use in relation to body parts 
in colloquial Arabic (Kotb 2002), their translation (Al-Qinai 2011), their transfer in 
cross-linguistic contexts (Ramajo Cuesta and Ainciburu 2015), and their sociolin- 
guistic and pragmatic functions (Kamel 1993; Badarneh 2016). 

The present study approaches Arabic formulaic expressions from a polite- 
ness theoretical perspective. It attempts to answer the following research ques- 
tions: 

- What role do Jordanian Arabic formulaic expressions play in social interac- 
tion in general and politeness in particular? 

— What do these formulae tell us about the politeness orientation and face con- 
cerns of Jordanians? 

— What do these formulae reveal about interactants' local sociocultural and re- 
ligious values and assumptions? 
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2 Theoretical Framework 


The interpersonal meanings of formulaicity have been regarded as a discourse 
phenomenon because they can only manifest themselves within some concrete 
context (Norrick 2003: 86). Since such interpersonal meanings affect the align- 
ment of conversational participants and their interpersonal relationships, the use 
of formulaic expressions as part of the poetics of everyday talk can be approached 
from the perspective of politeness behavior (Norrick 2003: 86). Relevant to this 
concern is the politeness theory of Brown and Levinson (1987) that is predicated 
on the sociological and highly abstract notion of 'face', derived from Goffman 
(1967) and from the English folk term linking face with embarrassment or humil- 
iation. This ‘face’ is defined as “the public self-image that every member wants to 
claim for himself", which makes face *emotionally invested" (Brown and Levin- 
son 1987: 61). This concept of face consists of two aspects: positive and negative 
face. Positive face refers to participants’ “perennial desire that [their] wants (or 
the actions/acquisitions/values resulting from them) should be thought of as de- 
sirable" (Brown and Levinson 1987: 101). Preserving the positive face of others 
would thus result in positive politeness, which involves the choice of strategies 
that emphasize solidarity with the addressee. These include claiming ‘common 
ground' with the addressee and satisfying the addressee's wants (Brown and Lev- 
inson 1987: 101-129). Negative face refers to an individual's “want to have [their] 
freedom of action unhindered and [their] attention unimpeded” (Brown and Lev- 
inson 1987: 129). Maintaining this negative face of others would thus lead to neg- 
ative politeness, which is linguistically realized through strategies that empha- 
size deference for the addressee, such as the use of conventional indirectness, 
hedges on illocutionary force, polite pessimism, e.g. about the success of a re- 
quest, and emphasizing the relative power of the addressee (Brown and Levinson 
1987: 130). Thus, negative politeness constitutes “rituals of avoidance" (Brown 
and Levinson 1987: 129) in polite interaction. In this theory, face is taken to bea 
universal notion in all human societies, and conversational participants are as- 
sumed to be rational agents who will ideally seek to preserve both their own face 
and their interlocutor's face in a verbal interaction. While stressing this univer- 
sality of face, Brown and Levinson (1987: 13) recognize that in any particular so- 
ciety face can be "subject to cultural specifications" and naturally links up to 
“fundamental cultural ideas about the nature of the social persona, honor and 
virtue, shame and redemption and thus to religious concepts". 

In such verbal interaction, different speech acts used by conversational par- 
ticipants intrinsically threaten face, such as criticisms, requests, and disagree- 
ments, referred to as ‘face-threatening acts’ (FTAs). A variety of FTAs can threaten 
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the addressee’s positive face, such as accusations, disagreements, and disap- 
proval, while other FTAs can threaten the addressee’s negative face, such as or- 
ders, advice, and warnings. A number of FTAs can be damaging to the speaker’s 
own positive face, such as apologizing, while other FTAs can threaten the spea- 
ker’s own negative face, such as accepting apologies. Given that both the speaker 
and the addressee seek normally to preserve face, an FTA that is damaging to the 
addressee’s face will be also a potential threat to the speaker’s face, and vice 
versa (Brown and Levinson 1987: 65-68). To minimize the threat of such FTAs, 
Brown and Levinson (1987) propose a set of strategies that are ordered along a 
continuum of remedial action, suggesting that the more threatening an FTA is, 
the more polite strategy one must use to reduce its damaging effects. In this case, 
the speaker uses a ‘face-saving act’ that involves either positive politeness re- 
dress or negative politeness redress, with negative politeness redress ranked 
higher than positive politeness redress. In contrast with negative politeness 
where “the sphere of relevant redress is restricted to the imposition itself, in pos- 
itive politeness the sphere of redress is widened to the appreciation of alter’s 
wants in general or to the expression of similarity between ego’s and alter’s 
wants” (Brown and Levinson 1987: 101). 

Although Brown and Levinson (1987: 43) play down the importance of polite- 
ness routines by stressing the ‘generative’ production of linguistic politeness, 
they nonetheless state that polite formulae clearly form an important focal ele- 
ment in folk notions and in the distinction between ‘personal’ tact and ‘posi- 
tional’ politeness, where the latter is associated with formulaic decorum (Coul- 
mas 1979, 1981), of the type that is investigated in this paper. 

This approach to linguistic politeness, which is pragmatic, is contrasted with 
a recent approach, described as post-pragmatic or ‘discursive’ (Watts 2003: 9), 
which constitutes a non-contextual paradigm of politeness and a complete de- 
parture from the static classical view of politeness adopted by Brown and Levin- 
son (1987). This approach is based on a dynamic view of politeness, arguing that 
politeness is negotiated by the speaker and the hearer. Accordingly, politeness 
becomes discursive and negotiable, to the extent that “no linguistic expression 
can be taken to be inherently polite” (Locher and Watts 2005: 16), and hence po- 
liteness is theorized as the product of the evaluations of interactants in a partic- 
ular speech event rather than assessment based on context. However, as argued 
by Schlund (2014: 5), this assumption may be suitable to a general theory of social 
practice, but it is not sufficient in linguistic terms because it does not provide a 
theoretical framework for the analysis of the structure of linguistic politeness de- 
vices, like the formulaic expressions examined here. Moreover, “the speakers of 
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a given language would simply learn and thus know that certain linguistic polite- 
ness patterns stereotypically occur in certain speech situations" (Schlund 2014: 
277). 


3 Data 


The data on which this paper is based consist of 94 formulaic expressions that 
were ethnographically observed in different everyday interactions in Jordanian 
Arabic. These formulae were collected by the researcher in naturally occurring 
interactions and exchanges involving native speakers of Jordanian Arabic. The 
observed interactions in which the formulaic expressions were used involved talk 
in a variety of conversational settings, such as interactions in supermarkets, gro- 
cery stores, restaurants, coffee shops, and shopping malls, interactions on pri- 
vate and public occasions, and interactions among family members, in-laws, 
friends, acquaintances, and strangers, which involved both interactional and 
transactional talk. The instances of formulae were checked against the research- 
er’s own sociocultural background knowledge, expertise and repertoire as a na- 
tive speaker and as a member of the community, as well as against the knowledge 
and expertise of other native-speaker community members who were consulted 
regarding the use and accuracy of each formula. All formulae collected were ver- 
ified as authentic expressions that are recognized and used in actual social inter- 
action. No formula displayed cross-functioning (Moon 1992: 21-22), that is, no for- 
mula was found to be “used with a function other than and additional to its pri- 
mary one”. 

A number of the formulaic expressions collected behave like an ‘adjacency 
pair’ (Schegloff and Sacks 1973). That is, the formula becomes the first part of an 
automatic sequence where the utterance of the formula immediately creates an 
expectation of the utterance of a second part, i.e. asecond formulaic response. In 
contrast, a number of formulaic expressions in the data do not create this expec- 
tation of a second-part formulaic response, and the addressee has thus the option 
to respond in a non-formulaic way. Given that these routine formulae are “situa- 
tion-bound utterances” (Kecskés 2003), they are standardly taken to have a spe- 
cific meaning and use in a specific context, beyond which they cannot be used. 
The occurrence of a formula, and thus its function, is therefore predetermined by 
a specific context that is communally agreed upon by the interactants. 

It is important to point out that there are no corpora, electronic or otherwise, 
available for spoken Jordanian Arabic. This represents a methodological obstacle 
in the study and analysis of formulaic expressions and patterns in the spoken 
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data of this Arabic variety. Ethnographical observation was chosen, therefore, to 
obtain the data. While this method has disadvantages such as time constraints, 
reliability, and access, it has a variety of advantages, especially direct observa- 
tion of how the formulae are used, the ability to have a holistic view of these for- 
mulae, validity of the data as first-hand evidence, and being ecologically sound 
as the author is a member of the community where these formulae are used. Eth- 
nographical observation would thus provide an effective method for the study of 
formulaic patterns in a given community as the researcher would better under- 
stand how such patterns are used in naturally occurring exchanges and be able 
to explicate the sociocultural meanings involved in their use. 


4 Analysis 


As stated above, each formula in Jordanian Arabic is situation-bound and con- 
text-specific. Accordingly, each formulaic expression was approached and ana- 
lyzed according to the function it performs in the given (predetermined) commu- 
nicative context. In addition to providing the predetermined context, the analysis 
of these formulae involved explicating their semantics, use and function, the typ- 
ical response to the formula by the addressee, how the formulae reflect the soci- 
ocultural values and assumptions of the community at large, and the aspect of 
politeness with which these formulae are concerned, namely, positive or negative 
face of the speaker, the addressee, or both. Literal translations of these formulae 
are provided in order to capture their local sociocultural meanings and flavor, 
which are then explicated in more detail in the analysis. 

The analysis of the data reveals that formulaic expressions used in Jordanian 
Arabic fall into two categories. The first category is oriented toward the positive 
face of the addressee or audience to communicate solidarity and common 
ground, and this category consists of two types: interactional formulae and trans- 
actional formulae. The second is concerned with the negative face of the ad- 
dressee or audience to emphasize deference and non-imposition. 


4.1 Positive Politeness Formulae 


The great majority (= 76) of the formulaic expressions in the data (80.9%) were 
found to be oriented toward positive politeness, which is in line with the obser- 
vation that Arab societies tend to favor positive politeness (e.g. Davies 1987). For- 
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mulae oriented toward the addressee's positive face are designed to communi- 
cate that the speaker and the addressee are familiar to one another. Thus, these 
formulae serve as an inclusive, in-group membership marker whereby the ad- 
dressee is considered to be an insider treated as someone who belongs to the 
same community and shares the same sociocultural values. Positive politeness 
formulae in Jordanian Arabic attending to different aspects of the addressee's 
positive face can be categorized in terms of situational appropriateness into two 
types of contexts: interactional and transactional. 


4.1.1 Interactional Formulae 


Almost all formulaic expressions in the current data (N = 91) are employed in pri- 
marily interactional (Brown and Yule 1983: 1) contexts, including social occasions 
involving festive and celebratory social bonding, such as public or private invita- 
tions to food on occasions such as weddings, graduation, house-warming, and 
child's birth invitations. In such interactional contexts, formulaic expressions are 
oriented toward focusing on the participants and their social needs, they are in- 
teractive, requiring two-way participation, and they reflect the participants' soci- 
ocultural and religious identity. Being interactionally oriented, these formulaic 
expressions can thus serve to *establish and maintain social relationships" and 
“negotiate role-relationships, peer-solidarity, the exchange of turns in a conver- 
sation, [and] the saving of face for both speaker and hearer" (Brown and Yule 
1983: 3). 

One of the domains where positive politeness formulaicity is exploited is hos- 
pitality, which is considered an inherent and hallowed ritualized tradition of Jor- 
danian society (Shryock 2004: 35). Consider the following instances of hospita- 
lity-based formulae!: 


(1) Host: ahlan wa-sahlan 
2a | 
‘Family and smooth land’ 
Guest: bi-l-mahlli 
EN 


‘(Same) to one who says *welcome" 


(2 Host: nawwarat 
uyy 


1 All translations are the author’s unless otherwise indicated. 
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‘(Our home) has lightened’ 
Guest: mnawrah bi-shab-ha 
(aa, 555 
‘(It is already) lightened with its owners’ 


(3) Guest: (Praises guest’s food) 
Host: sihtayn w-“äfiyeh 
Ale y ioa 
‘(May the food be) double health and healthiness (to you)’ 
Guest: ^alà qalb-ak 
‘(Same) to your heart? 
(4) Host: (to someone shows up while one is eating): intah fal-ak 
al AL 
‘Hit your good omen’ 
Guest: (accepts or declines) 


In (1), the speaker, i.e. the host, uses the traditional greeting formula ahlan wa- 
sahlan, typically functionally translated as ‘welcome’, and sometimes reinforced 
by wa-marhaban ‘and hello’, upon seeing the addressee, i.e. the guest. This for- 
mula is a prototypical politeness device in Arabic that is patently oriented toward 
the addressee’s positive face, specifically his/her need to feel welcome upon be- 
ing seen by the host. This can be further appreciated when considering the origi- 
nal expression from which the present formula derives, namely atayta ahlan wa- 
wati’ta sahl-an, which literally means ‘You have come upon family and treaded 
on smooth land’. Thus the now shortened formula carries the meaning ‘You are 
among people who are (like) your family and in a place that is hospitable to you’. 
The guest’s one-word response bilmahlli ‘same to one who says welcome’ is equal- 
ly formulaic and is designed to communicate equal positive politeness toward the 
host. Hospitality is thus formulaically shown to be central to the habitus (Bour- 
dieu 1991: 37) of the local culture and its assumptions about the rights, needs and 
obligations of the host and the guest, and at the same time the importance at- 
tached to ‘verbal generosity’, combined of course with ‘food generosity’, toward 
one’s guest or visitor is demonstrated. 

In example (2), the host formulaically greets and welcomes the guest using 
the expression nawwarat. This formula metaphorically treats the guest as a 
source of ‘light’ that has eliminated ‘darkness’ in the host’s home, which invokes 
the host’s conceptualization and evaluation of the guest’s visit and presence as 
highly desirable and conducive to joy. As nawwarat constitutes a compliment for- 
mula, the guest responds with another formulaic expression that is designed to 
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reflect the guest's modesty and deflect the compliment by attributing such meta- 
phorical ‘light’ to the host, that is, the owner and inhabitant of the house, which 
reduces the force of the compliment and shows the guest's appreciation of the 
host's invitation. 

The health-wishing formula sihtayn u‘äfiyeh in (3) is invariably used to con- 
vey the speaker's wish that the food has a salubrious effect on the guest, typically 
after food is served. The fact that the formula consists of the dual sihtayn ‘literally, 
two healths' and reinforced by its synonym ‘“afiyeh ‘healthiness’ reflects the for- 
mulaic aspect of home hospitality ritual, showing the strong emphasis placed on 
and the importance assigned to this ritual in the local culture. The guest's re- 
sponse ‘ala qalbak ‘lit. (same) to your heart’ is equally formulaic and reflects the 
guest's similar health wish for the host. However, through metaphorical refer- 
ence to the host's ‘heart’, the guest shifts the emphasis from his/her own physical 
health to the host's psychological well-being to reciprocate the host's positive po- 
liteness toward the guest. 

Formulaicity thus iconically plays an important role in the realization of hos- 
pitality (and other traditions of the society). That is, the fixedness that comes with 
such hospitality formulae seems to reflect the permanence and endurance of this 
ritual in society, whereby these formulae are constantly reproducible in any con- 
text of hospitality. This can further be seen in example (4) where the formula 
intah falak ‘literally, hit your good omen’ is used as an invitation to someone who 
shows up unexpectedly while the speaker is eating. The words of the formula, 
and their literal meaning, reflect its Bedouin origin, taken to be the provenance 
of the tradition of Arab hospitality. More importantly, the metaphorical nature of 
the formula shows a concern with avoiding directness in inviting or asking an 
unexpected guest to join the host while eating. Through such metaphorical for- 
mulaicity, the host implies that the unexpected guest has not come for the sole or 
primary purpose of eating food, something that is very much avoided in local cul- 
ture. It shows the speaker to be a competent community member who shows a- 
wareness of the addressee's face. 

Another area where positive politeness formulaicity operates in everyday Jor- 
danian Arabic is death-related discourse. While they show respect to the de- 
ceased, such ‘death formulae’ display solidarity with and provide solace to the 
audience affected, directly or indirectly, by the death event, such as family, rela- 
tives, and friends of the deceased (see e.g. Parvaresh and Capone 2017). While 
some of these formulae are inextricably linked to religious (i.e. Islamic) beliefs 
about death, other formulae constitute a colloquial form of prayer for the ad- 
dressee or audience to live a longer life than the deceased, hence their positive 
politeness value in discourse: 
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(5) Speaker: “addama allah-u ajra-kum 
Sos ail alse 
‘May Allah increase your reward!’ 
Addressee: Sakara allah-u sa‘ya-kum 


SR AE ES 
‘May Allah thank your efforts!’ 


(6) Speaker: yirham mà faqadit 
‘May mercy be upon the one you lost!’ 
Addressee: mada tufqud gali 
le ‘asi Le 
‘May you not lose a loved one!’ 


(7) Speaker: yislam räs-ak 
SU y aly 
‘May your head be safe!’ 
Addressee: allah yisalm-ak 
aal. ail 
‘May Allah keep you safe!’ 


(8) Speaker: (Name of deceased) a‘ta-k *umr-uh / *umr-ha 
a ja / 35 a6 SIUE ( i gall qul) 
‘(The deceased) has given you his/her life’ 


As can be seen, in addition to reflecting sociocultural assumptions about death, 
these death formulae are adjacency pairs characterized by their reciprocal na- 
ture, where the use of one formula requires the obligatory use of a specific, fixed 
response. The formula in (5) is the formal Classical Arabic expression of offering 
condolences, still used today especially when the setting is public. It is purely 
religious as it involves a prayer to Allah to multiply the divine reward, or ajr, for 
the person affected by death in his/her family. This divine reward is believed to 
be given to the person who patiently endures, rather than expresses dissatisfac- 
tion with, the pain of losing someone. The addressee reciprocates by praying that 
the speaker be ‘thanked’ by Allah for their efforts of coming and offering condo- 
lences, so the act of thanking is metaphorically made by God rather than the 
speaker him/herself. Example (6) is a colloquial, and hence less formal, formula. 
It is predicated on the two concepts of ‘loss’ and ‘mercy’ where divine mercy is 
invoked upon the deceased, and where the second part is a prayer that the 
speaker will not go through similar loss of a loved one. 
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The formulaic expression in (7) moves away from the divine and the religious 
toward a colloquial, socioculturally grounded mode of expression. More specifi- 
cally, positive politeness toward the addressee is communicated metonymically 
by wishing that the addressee’s ‘head’ be safe. The choice of this part of the body 
to refer to the person is motivated by the status of the ‘head’ in the Arab culture 
as a symbol of life itself (being alive). Focusing on these meanings shows how the 
formula is oriented toward the positive face of the addressee. 

While the death formulae in (5-7) are solace-providing, the one in (8) is used 
when communicating a death event to someone. Rather than using a neutral 
term, e.g. twaffa ‘passed away’, the formula a‘tak *umruh / *umrha figuratively 
transforms the death of someone into an extended new life given to the bereaved 
or the news recipient, as if the lost life of the deceased will result in a longer life 
given to the news recipient. The formulaic transformation of death into a new life 
for the addressee thus becomes a positive politeness strategy, clearly implicating 
solidarity with the addressee and wishing that he/she lives a longer life. Such 
formulaicity in delivering news of death thus avoids mention of death terms, and 
at the same time offers emotional support to the recipient of the news of death. 

Apart from the major discourses of hospitality and death above, formulaic 
expressions are employed as compliment speech acts in mundane everyday talk 
by way of orienting toward the addressee's or audience's positive face, as in the 
following examples: 


(9) Speaker: naciman! 
‘(May it be) a bliss!’ 
Addressee: allah yin‘im *alayk! 

dle zai, ait 


‘May Allah give you bliss (too)! 


(10) Speaker: Sarwa man "indi / Sarwa il-hadrin 
(gie. D 5 $55 / (6 $5 Oy pal 
‘As praiseworthy as those sitting with me / As praiseworthy as the audi- 
ence’ 
Addressee / wa-la t-hün 
Audience: G38 Y5 
‘May you not be belittled’ 


(11) Speaker: tihri w-tjaddid 
Xi, oH 
*May your clothes wear out and you renew them' 
Addressee: tislam / t&i$ / w-ilgayil 
ill, / Susi ea 
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‘(May you) be safe / (May you) live (a long life) / (Same to) the speaker’ 


In these examples, the formulae serve either as a compliment or a compliment 
response in a predetermined context. In (9), the formula na‘iman! ‘(May it be) a 
bliss! is used when meeting someone who has had a new haircut or has just taken 
a shower, and the absence of these formulae in these two contexts may be inter- 
preted as a threat to the positive face of the referent in question. The formula 
Sarwa man ‘indi or its variant Sarwa il-hadrin ‘As praiseworthy as you / as the au- 
dience’ in (10) is an obligatory formula that must be used by the speaker when 
complimenting an absent third party in front of the addressee or audience. The 
formula is used to eliminate any suggestion or hint that the speaker is criticizing, 
underestimating, or negatively evaluating the addressee/audience by implicit 
comparison with that absent third party. Thus, the formula preemptively shows 
the speaker’s awareness of the audience/addressee’s positive face, communi- 
cating the implicature ‘no critical comparison is intended’ and ‘you are equally 
good’. 

The formula tihri w tjaddid ‘May your clothes wear out and you renew them’ 
in example (11) is used as a compliment toward someone who has just bought 
new clothes. This compliment formula is not directed toward the new clothing 
item itself, but rather toward the person wearing it. By wishing that the addres- 
see’s new clothes wear out fast so that the addressee can buy yet more new 
clothes in the future, the formula expresses the wish that the addressee live a long 
life in which he/she can and will always buy and put on new clothes. This in turn 
implies that the speaker asks for constant renewal in the addressee’s life, a re- 
newal symbolized by the act of getting new items of clothing, so new clothes be- 
come a metaphor for renewal, i.e. prolonging, of the complimentee’s life. The lat- 
ter’s understanding of ‘prolonged and renewed life’ is reflected in the three pos- 
sible responses to the formula: tislam, which wishes the speaker safety and pro- 
tection; (^i$, which wishes the speaker a long life; and wilgayil which is function- 
ally similar to English ‘right back at you’, suggesting a more casual and easygoing 
tone. In these examples, the addressee shows agreement about praiseworthiness 
but with praise formulaically shifted back to the speaker in order to express com- 
mon ground with, and mutual liking toward, the speaker. 


4.1.2 Transactional Formulae 
In contrast with interactional positive politeness formulae, only three formulaic 


expressions in the data were found to be used in transactional contexts that are 
intrinsically concerned with the transmission of information, e.g. price, rather 
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than the maintenance of social relationships (Brown and Yule 1983). Brown and 
Levinson (1987: 103) insightfully argue that "positive politeness utterances are 
used as a kind of metaphorical extension of intimacy, to imply common ground 
or sharing of wants to a limited extent even between strangers who perceive 
themselves, for the purposes of the interaction, as somehow similar". The trans- 
actional formulae discussed here are illustrative of this point. Through using 
such formulae, the speaker, i.e. the salesperson or shopkeeper, implies that, de- 
spite the business-oriented nature of the transactional talk, the salesperson still 
cares about the social dimension of the transactional relationship with the ad- 
dressee, i.e. the customer. By using such formulae, the speaker will imply to the 
customer that the transaction is not purely materialistic or driven by mere finan- 
cial gain, and that the salesperson is interested in establishing or maintaining a 
social relationship with the customer. This is illustrated by the following two 
commonly used transactional formulae: 


(12) xalliha *alayna 


*Make it on us' 


(13) moawwadat | m*awwadin 
*May (what you paid) be compensated' 


The formula in (12) is commonly used by a salesperson or shopkeeper when the 
customer hands over the money to pay for the goods or service. Although the for- 
mula is of course understood by the customer as an “ostensible speech act” (Link 
and Kreuz 2005: 227) of suggesting that the customer take the goods for free and 
therefore cannot be taken seriously, its use is widespread among shopkeepers, 
barbers, and local service providers in Jordan as a way of establishing or main- 
taining a social relationship with the customer, thus merely serving as a positive 
politeness gesture toward the customer. Realizing the nonserious and insincere 
(Isaacs and Clark 1990: 493-494) nature of this formula, the response of the cus- 
tomer would be, of course, something to the effect of Sukran ‘thank you’. The for- 
mula thus becomes a positive politeness technique used as a kind of 'social ac- 
celerator' whereby the speaker indicates that he/she wants to ‘come closer’ to the 
addressee (Brown and Levinson 1987: 103). 

In contrast with (12), which is uttered before taking money from the customer, 
the formula in (13), m{awwadät, or its variant m‘awwadin ‘May (what you paid) 
be compensated' is commonly uttered by the shopkeeper after the customer has 
paid for the goods. It is essentially an invocation (to God) that the customer may 
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earn new money to replace the money he/she has just spent. While the speaker 
does not have to say this formula, uttering it will communicate that the speaker, 
even if not uttering the formula with total sincerity, at least sincerely wants to 
satisfy the customer’s positive face (Brown and Levinson 1987: 101). Using the 
formula gives the transaction an interactional flavor and shows interest in the 
customer as a social actor rather than just a paying customer. 


4.2 Negative Politeness Formulae 


Formulaic expressions in everyday Jordanian Arabic constitute an important re- 
source for satisfying the addressee’s negative face wants by showing respect to- 
ward and non-imposition on the addressee or audience, and communicating the 
speaker’s concern not to invade their private space. Although these negative po- 
liteness formulae constitute only 19.1% of the data collected (total 18), they cover 
different interactional aspects, as illustrated by (14-19) below: 


(14) ba-la zugrah 
‘Without smallness’ 


(15) w-inta b-karamah 
dal 83 Gul, 


‘And you are in dignity’ 


(16) w-inta l-kabir 
29A Gi) g 


‘And you are the big one’ 


(17) w-inta l-sadig 
Gall Gi s 
‘And you are the truthful one’ 


(18) wa-la ta‘lim *alayk 
due li Y, 
‘And no teaching of you’ 


(19) harjak “ala räsi 
(pl) cle di A 
‘Your talk is on my head’ 
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As these formulae are addressed to negative face, they lie at “the heart of respect 
behavior" (Brown and Levinson 1987: 129). The formula bala zugrah in (14) is in- 
voked when asking someone about his/her name or identity. The function of the 
formula is to minimize or eliminate the addressee's sense of being unknown or 
insignificant. That is, asking the addressee about their name or identity sociocul- 
turally communicates that the addressee is not known (hence zugrah ‘smallness’) 
in the community, so the formula serves to redress or eliminate that implicature. 

The formula in (15) is called upon whenever a speaker mentions something 
that is socioculturally deemed offensive or evocative of unpleasant images. The 
mention of such a thing without using the hedging formula w-inta b-karamah 
‘and you are in dignity’ is considered an invasion of the listener’s auditory space 
and hence a threat to their negative face, interestingly signified by the word 
karamah ‘dignity’. Examples of topics with which this formula must be used in- 
clude mention of or reference to shoes, the toilet, feces, urine, and certain ani- 
mals such as dogs, donkeys, and mules as these entities socioculturally symbol- 
ize, or are associated with, inferiority. 

The formula w-inta l-kabir in (16) is used in a requesting context whereby the 
speaker (i.e. the requester) communicates to the requestee that the request made 
does not in any way mean or suggest that the requester has any power over, or 
has a higher status than, the requestee or that the requestee is under any obliga- 
tion to comply with the request. In fact, the formula shifts a higher status onto 
the requestee (‘you are the big one’), thus eliminating any sense of imposition 
upon, or intention of disrespect toward, the requestee. The appeal to kabir ‘big’ 
shows how the formula is used as a strategy for softening requests in a kinship- 
based society like the Jordanian one (see Brown and Levinson 1987: 117-118) 
where there is a strong sense of social hierarchy. 

The formula w-inta I-sadig ‘and you are the truthful one’ in (17) is employed 
in disagreement, an act typically seen as confrontational and are therefore dis- 
preferred, and should therefore be mitigated or avoided (but see Sifianou 2012). 
The use of this formula stems from the need to minimize the face-threatening act 
of disagreement. Thus when the speaker disagrees with, contradicts, or corrects 
something that a participant has said, he/she softens such disagreement, contra- 
diction or correction by formulaically communicating the message ‘although 
what you said is inaccurate or incorrect, you are a truthful person and it is not my 
intention to accuse you of lying’. 

The use of the formula wa-la ta‘lim ‘alayk ‘lit., and no teaching of you’ in (18) 
constitutes a sociopragmatic norm in the specific context of giving of advice or 
making a suggestion (see DeCapua and Dunham 2007). In local Jordanian cul- 
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ture, giving advice or instructing the addressee on how to perform a specific ac- 
tion or how to behave in a certain situation is considered highly face-threatening 
as it implies lack of knowledge or expertise on the part of the advisee. Therefore, 
this formula is used in such contexts as a preemptive strategy to deny any inten- 
tion of arrogance or condescension on the part of the speaker. By using this for- 
mula, the speaker conveys that he/she is not assuming epistemological superior- 
ity over the addressee. 

Finally, the formula harjak ‘ala räsi ‘lit. your talk is on my head’ in (19) is 
used in interruptions, often perceived as intrusive as they involve blocking of the 
flow of the current speaker’s talk. This formula can be described as a meta-inter- 
ruptive speech act whose use signals interruption of the current speaker, and as 
a palliative offered by the interrupter showing recognition that some infringe- 
ment of the current speaker’s rights has occurred. As the interruption is initiated, 
the formula is immediately invoked to communicate lack of intention to cause 
any threat to the current speaker’s negative face, specifically the right to be heard 
and listened to. The formula is based on a metaphorical positioning of the current 
speaker’s talk ‘on the head’ of the interrupting speaker where ‘head’ here sym- 
bolizes the highest degree of respect in Jordanian culture (cf. example 7). While 
the act of interruption occurs, the formula signals a preservation of negative face 
to communicate that no threat to the current speaker’s autonomy and control 
over their talk turn is intended, thus making the act of interruption sound less 
disaffiliative and more affiliative. 


5 Conclusion 


The present chapter has provided a pragmatic account of the use of formulaic 
expressions in everyday Jordanian Arabic, grounded in the classical face-based 
politeness theory of Brown and Levinson (1987). These formulae constitute an in- 
tegral part of everyday communication. They are so important because they are 
oriented towards preserving the participants’ both positive and negative faces, 
thus ensuring the smooth flow of communication and adherence to fundamental 
sociopragmatic principles and rules. 

Given the important and sensitive aspects of managing both the speaker’s 
and the addressee’s face in social interaction, the present study shows how for- 
mulaicity has come to play an important role in polite social interaction. As em- 
phasized by Terkourafi (2002: 196), formulaic expressions “provide ready-made 
solutions to the complex and pertinent problem of constituting one’s own and 
one’s addressee’s face while simultaneously ensuring that one’s immediate goals 
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in interaction are achieved". The use of Jordanian everyday formulae as dis- 
cussed in this paper support the notion that formulaic expressions have an im- 
portant role in the speaker effectively assuming the role of social actor as these 
formulae *embody accepted ways of responding verbally to a variety of situa- 
tions” and therefore their use becomes “a strong indication of belonging, social 
identity or acculturation" (Coulmas 1994: 1293). As Terkourafi (2002: 196) main- 
tains, this characteristic of formulaic speech makes it conducive to maintaining 
one's face through demonstrating familiarity with the norms of the community to 
which the speaker belongs. The present study thus supports the argument that 
formulaic speech carries the burden of polite discourse and the prediction that 
“the use of formulae may be a prominent feature of polite discourse in any cul- 
ture", which needs "further quantitative studies of polite discourse across cul- 
tures" (Terkourafi 2002: 197). The present study has sought therefore to demon- 
strate this connection between formulaicity and politeness by examining for- 
mulae whose use is oriented towards preserving the participant's face in the un- 
derexplored language and culture of Jordanian society. 

Formulaicity in social interaction as explored in this paper across specified, 
socially recognized and ratified communicative contexts suggests that such for- 
mulaicity reflects the fixedness of the norms and traditions of Jordanian society. 
This formulaicity further reflects the positive politeness leanings of Jordanians, 
as the majority of these formulae are oriented toward positive rather than nega- 
tive face. Accordingly, through these formulae one can see more concern with 
solidarity and acquaintance, collectivist satisfaction, and communal belonging, 
as opposed to individualism and personal space. 
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Part Ill: Linguistic Varieties Used in Spoken 
Domains and/or Regarded as ‘Conceptually 
Oral’ 


Joanna Szerszunowicz 

New Pragmatic Idioms in Polish: 

An Integrated Approach in Pragmateme 
Research 


Abstract: The general aim of the paper is to discuss the purposefulness and use- 
fulness of the adoption of an integrated approach in research on pragmatic idi- 
oms, i.e. conventional expressions used in recurrent situations, also called rou- 
tine formulae, pragmatic idioms or pragmatemes. Used mostly in spoken lan- 
guage, such units tend to be neglected in phraseological analyses, although they 
are very important from a communicative perspective. In order to provide a com- 
prehensive analysis of this group of idioms, one needs to analyze not only the 
linguistic aspects, but also to take into consideration other factors, for instance, 
the cultural background. The specific aim is to analyze the implementation of the 
proposed approach in the studies of selected recent Polish idioms of pragmatic 
character which came into existence after 1989, the year of Poland’s political and 
economic transformation. 


1 Pragmatic Idioms as Objects of Phraseological 
Research 


Recent years have witnessed an increase in interest in phraseology understood in 
the broad sense of the term. The expression phraseological unit tends to be used 
as an umbrella term encompassing a variety of fixed constructions: collocations, 
idioms, proverbs, winged words and routine formulae. The last group is com- 
posed of units whose indirectness, i.e. non-literal character, is rooted in the form, 
not in mental imagery (Dobrovol’skij and Piirainen 2005: 21). 

As observed by Dobrovol’skij and Piirainen (2005: 20), one of the main prob- 
lems encountered while dealing with indirect utterances consists in the fact that 
there are many fixed multiword combinations which resemble conventional ex- 
pressions, but which still remain textual units, not lexical ones. Yet it has to be 
admitted that it is rather difficult to draw a definite borderline for all candidate 
units, so a vast transitory area can be assumed. Although routine formulae are 
very useful from a communicative perspective, they tend to be given less scope 
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in phraseological analyses than figurative idioms. Therefore, it is very important 
to focus on pragmatic multiword units used in spoken language to offer insight 
into their specifics. 


1.1 Pragmatic Idioms: Properties and Classifications 


In modern research on phraseology, as observed by Pawley (2007), since the 
1970s, there has been an increasing interest in situation-bound expressions — 
pragmatic formulae. These are known by a number of different terms such as rou- 
tine formulae, communicative phrasemes, pragmatic phrasemes or pragmatemes, 
functional idioms, interpersonal idioms or pragmatic idioms (Aijmer 1996; Burger 
2010; Coulmas 1979; Cowie et al. 1983; Fernando 1996; Fléchon et al. 2012; Lüger 
2007; Mel’Cuk 1988; Roos 2001). 

Basically speaking, pragmatic idioms are conventionalized multiword ex- 
pressions which are used in recurrent situations (Fiedler 2007: 50; Lüger 2007: 
444). Although the need for studies on pragmatic aspects of idioms was noticed 
by Fillmore et al. (1988), Pawley (2007: 19) draws attention to the fact that “there 
have been surprisingly few studies of the full array of attributes exhibited by 
pragmatic formulae".! Such expressions perform various functions in the realiza- 
tion of speech acts; they are used for instance as greetings (How are you?), leave- 
taking formulae (Take care!), encouragements (Keep smiling!), replies (You're 
welcome), congratulations (Happy birthday!) etc. Selected Polish expressions be- 
longing to this group will be analyzed in the analytical part of the paper (2). 

Fernando (1996) draws attention to the fact that pragmatic idioms differ sig- 
nificantly from ideational ones, which is important from the research perspective. 
First of all, pragmatic idioms are “overtly or covertly marked for interaction" (Fer- 
nando 1996: 154). Being discourse-oriented, they contribute greatly to structuring 
the conversation and ensuring its coherence. To a great extent, the realization of 
stereotyped speech acts relies on conventional phrases (Kauffer 2013). In fact, alt- 
hough some of the expressions in question are fixed and lexically invariant, like 
After you or Say when, others are embedded in variant forms. Happy birthday is 
an example of this latter group: it can appear on its own as well as in variants, for 


1 As for the attributes of pragmatic idioms, it is worth mentioning the contributions of Pawley 
(1991, 2001), Coulmas (1981) and Aijmer (1996). Ruusila (2015), who focuses on the lexicographic 
description of pragmatic fixed expressions, also offers an insight into their specifics (Ruusila 
2015: 25-112). 
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instance, Have a happy birthday, Have a very happy birthday, Wishing you a very 
happy birthday etc. (Fernando 1996: 154). 

Adopting the function as the main criterion, Granger and Paquot (2008) pro- 
pose an extended version of Burger’s classification of fixed expressions (2010: 36- 
42), which is adopted for the purpose of the present study. They distinguish three 
main groups of phrasemes: referential, textual (extending Burger’s structural 
phraseme category), and communicative. Referential phrasemes are carriers of 
content messages, i.e. they refer to persons, objects, phenomena etc. According 
to Granger and Paquot (2008: 42), this group comprises the following kinds of 
units: lexical and grammatical collocations, idioms, irreversible bi- and trinomi- 
als, similes, compounds, and phrasal verbs. Textual phrasemes include: complex 
prepositions, complex conjunctions, linking adverbials, and textual sentence 
stems. As for the last group, it can be said that 


Communicative phrasemes are used to express feelings or beliefs toward a propositional 
content or to explicitly address interlocutors, either to focus their attention, include them 
as discourse participants or influence them. 

(Granger and Paquot 2008: 42) 


Communicative phrasemes constitute a large and varied group which includes 

several kinds of units. Coulmas (1981) offers a classification comprising five clas- 

ses of pragmatic idioms (discourse structuring formulae, formulae of politeness, 
metacommunicative formulae, formulae expressing the speaker’s emotional atti- 
tude, delaying formulae), with a further subdivision into 17 subtypes. Fernando 

(1996) proposes four categories of interpersonal idiomatic expressions: markers 

of conviviality, institutionalized good wishes and sympathy, information-ori- 

ented units, and markers of conflict, subdividing each of the groups. 

As for the main interactional functions of formulae, Wray (1999) and Wray 
and Perkins (2000) distinguish the following three: manipulation of others, in- 
cluding such subtypes as politeness markers, commands, requests etc.; asserting 
separate identity, comprising for instance personal turns of phrase and turn 
claimers; asserting group identity, with such subtypes as, inter alia, proverbs and 
hedges. 

Granger and Paquot (2008: 44) list the following main kinds of pragmatic ex- 
pressions: 

1. Speech act formulae or routine formulae — phrasemes which are stereotyped 
ways of performing given functions, for instance, greetings, farewells, com- 
pliments etc. 

2. Attitudinal formulae - units expressing language users’ attitude towards 
their utterances and their interlocutors. 
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3. Commonplaces - non-metaphorical sentences expressing tautologies, tru- 
isms and sayings reflecting everyday experience, observations etc. 

4. Proverbs - sentence-like units expressing widely accepted ideas in a figura- 
tive way. 

5. Slogans - short phrases of directive character which have been made popular 
as a result of their repeated use in advertising texts or political discourse. 


It should be emphasized that many formulae can fulfil several pragmatic func- 
tions in communication. An illustrative example is the English phrase you know, 
which can be used as a filler, an attention-seeking formula and an appeal for 
shared knowledge (Moon 1998: 188). Similarly, as observed by Inoue (2007: 163- 
167), here we go is a multifunctional unit.’ It can be employed to capture atten- 
tion, to rouse people to do something, to express irritation, to show agreement, 
to indicate that the speaker has found something, to show something to the in- 
terlocutor. 


1.2 Key Issues in Research on Pragmatic Idioms 


As already mentioned, there has been a constant interest in pragmatic idioms 
over the last decades. Scholars adopt different perspectives, both diachronic and 
synchronic, to analyze various issues concerning pragmatic idioms. Since formu- 
laic language contributes greatly to fluent speech production and comprehen- 
sion, the role of conventionalized multiword expressions has been analyzed from 
this perspective by a number of people, including Wray and Perkins (2000). Spe- 
cial attention is paid to the importance of prefabricated language including prag- 
matic idioms both in mother tongue acquisition and foreign language teaching 
(Nattinger and DeCarrico 2001; Nosowicz and Szerszunowicz 2004). Yet one of 
the main research questions remains how to identify formulaic language (Dobro- 
vol'skij and Piirainen 2005: 20; Wray 2009: 27-51), a problem that is closely re- 
lated to the general definition of formulaicity. 

In terms of linguistic analysis of multiword expressions, the construction 
grammar approach offers a methodological basis for analyses. Adopting this ap- 
proach (Croft 2001) allows for viewing constructions as partially arbitrary sym- 
bolic units, i.e. pairings of form (syntactic, morphological and phonological prop- 
erties) and meaning (semantic, pragmatic and discourse-functional properties). 


2 In her book, Inoue (2007) also offers detailed analyses of the units you know what and let's say 
including their variant forms. 
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This perspective offers an appropriate basis for conducting a multiaspectual an- 
alysis of constructions (cf. Richter and Sailer 2014; Ziem 2014) 

In fact, as already mentioned, fixed word combinations can be analyzed from 
various angles. While discussing the developments in the study of formulaic lan- 
guage since 1970, Pawley draws attention to several research problems. One of 
the most important aspects is the presence of pragmatic idioms in oral genres. All 
kind of genres, including the oral ones, can be analyzed from a phraseological 
viewpoint. For example, Brown (1987) investigates radio sports commentaries on 
rugby, Kuiper (1996) offers an analysis of auctioneers’ sales talk and Hickey and 
Kuiper (2000) discuss the highly formulaic character of the New Zealand meteor- 
ological weather forecasts. 

Another important issue is the relation between pragmatic idioms and cul- 
ture. In fact, as emphasized by Piirainen (2008: 215), “only few routine formulas 
are figurative in the sense that elements of culture can be found in their source 
domain”. One of them is the expression Touch wood!, which is a gesture-based id- 
iom used after the speaker has said that something is going fine and by using the 
idiom he/she wishes it continues in this way. The gesture comes from old folk 
beliefs, according to which “woods and trees have good spirits” (Piirainen 2016: 
452). 

Yet, as observed by Piirainen (2008: 215), pragmatic idioms are “part of a 
larger complex of stereotyped action patterns and social interaction”. If this per- 
spective is adopted, then communicative phrasemes can be analyzed in terms of 
culturally conditioned communication. Therefore, it can be concluded that lin- 
guo-cultural analyses of pragmatic idioms may reveal many interesting facts re- 
garding the ethnic community in which they developed. 

From a linguo-cultural perspective, it is also worth analyzing such idioms di- 
achronically, observing the development of formulaic language over centuries. 
For instance, Filatkina and Hanauska (2010) discuss formulaic language in ma- 
terial excerpted from a corpus of German historical texts from 750 to 1750, witha 
focus on how routine formulae were used in Old High German texts and which 
functions they performed. The studies of different periods, even not so distant 
ones, are likely to offer interesting findings, as is the case with the comparison of 
communication styles and style-related phraseology in Poland before and after 
the transformation of 1989 analyzed in the present paper. 

In the same vein, i.e. taking into consideration the cultural aspect, pragmatic 
idioms have been analyzed from a cross-linguistic perspective: one such study 
was conducted by Jakubowska (1998, 1999), who looked into cross-cultural di- 
mensions of good wishes and - in a broader perspective — politeness, discussing 
many fixed expressions which function as exponents of politeness in Polish and 
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English. The inclusion of this component facilitates the understanding of chang- 
es occurring in the sphere of pragmatic phraseology, which is attested by the ob- 
servations regarding Polish pragmatic idioms coined in the post-transformation 
period. Such analyses can reveal similarities and differences, and can detect 
cross-linguistic lacunae in pragmatic phraseological stock. 

All these approaches contribute greatly to developments in the field of for- 
mulaic language, since they enable researchers to analyze pragmatic idioms from 
various angles. Adopting a linguo-cultural perspective, one can assume that as a 
result of cultural developments, new oral genres will develop, which involves the 
coinage of new pragmatic idioms. Significant cultural changes may trigger the 
creation of new expressions, including those of pragmatic character. This phe- 
nomenon will be discussed on the basis of Polish phrasemes coined after the po- 
litical transformation of 1989. 


2 Polish Phraseology after the Transformation of 
1989 


In his article on Poland, French sociologist Alain Touraine commented on the 
1989 transformation, saying that 


Kraj oderwat sie od Wschodu i stał sie częścią Zachodu |...] Zrobił to z niezwykłą polska 
brawura, powtórze, skok na konia i galopem! [lit. The country got disconnected from the 
East and became part of the West. [...] It did it with incredible Polish bravery, I’ll repeat, on 
horseback and at a gallop!]. 

(“Gazeta Wyborcza”, 7-8.08.1999) 


This observation reflects very concisely on the importance of the transformation, 
emphasizing the dimension of its consequences. 

The notions of the East and the West are the key ones from a Polish perspec- 
tive: in fact, Poland is a country with a turbulent past and a special location de- 
scribed by Mrozek, a Polish writer, as a country to the east of the West and to the 
west of the East.’ The year 1989 is momentous in the history of the country, since 


3 Slawomir Mrozek (1930-2013) was a Polish dramatist and writer. His works belong to the The- 
atre of Absurd. He shocked his audience with non-realistic elements and political and historical 
references in order to present the absurdity of real socialist life. His most famous works are Tango 
(Tango) and Emigranci (The Emigrants). The sentence quoted, *Pochodze z kraju polozonego na 
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it marks the beginning of the political and economic transformation, often re- 
ferred to as a turning point. The changes which occurred were fundamental and 
influenced all spheres of life in Poland. 


2.1 The Transformation of 1989 as a Turning Point in Polish 
History 


The year 1989 was of great importance not only to the Polish, but also to the other 
Central European nations. It was then that the transformation began - a transfor- 
mation that was caused by the fact that the political system did not function and 


Both the governed and governors of these countries almost commonly accepted that the 
economic system based on central management and state ownership lost in the competition 
with the system based on private property and individual entrepreneurship, market com- 
petition, coordinating role of prices and regulatory role of law. 

(Gomutka 2016: 19) 


In June 1989, the anti-communist party "Solidarity" won the first partially free 
election. The two main areas in which changes occurred were politics and the 
economy. In January 1990, the old political system collapsed completely: the 
Polish United Workers' Party (PZPR) was dissolved, democracy was adopted, a 
multi-party system was introduced and different party organizations were set up 
(Banaszkiewicz-Zygmunt and Olendzki 2000: 407). 

In the economic sphere, there were several goals: the first was to restore a 
sustainable macroeconomic equilibrium, the second to fully liberalize prices and 
foreign trade, the third to restore the development potential of the Polish econ- 
omy so that the difference in the standard of living in relation to Western Europe 
could be reduced and - in the end - eliminated (Gomulka 2016: 19). In 1990, the 
government introduced broad economic reforms to curb hyperinflation and to 
thoroughly restructure the country (Banaszkiewicz-Zygmunt and Olendzki 2000: 
407). 

The changes were pervasive: the transformation influenced Poles' perception 
of Europe (Bartminski 2001: 45-49). Furthermore, certain differences occurred in 
the perception of nationalities: for example, the image people have of a German 
has improved, while that of a Russian has deteriorated - modifications which 


wschód od Zachodu i na zachód od Wschodw" [lit. I come from a country situated to the east of 
the West and to the west of the East], comes from Kontrakt (Contract, 1986). 
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correspond to the re-evaluation of the oppositions: East-West, Asia—Europe, 
Communism-Capitalism (Bartminski 2001: 39). 

The turning point of 1989 also had an impact on the hierarchy of values: work 
and climbing up the career ladder became much more important than before, 
with new opportunities offered by various companies and branches of foreign 
corporations. Being successful became the main objective, which involved 
changes in the creation of the self-image (Szerszunowicz 2007: 43). The excessive 
modesty imposed by Polish culture was no longer useful at job interviews, which 
were a new phenomenon on the Polish labor market.* Achieving success, Poles’ 
new priority (cf. Ozóg 2004: 237), involved the conscious creation of one’s own 
image: the awareness of having appropriate soft skills increased greatly. These 
changes serve as examples showing how the transformation influenced Polish 
society. 


2.2 Language Changes in the Post-Transformation Period 


The transformation of 1989 influenced all spheres of life including that of lan- 
guage. First of all, there was a significant increase in the creation of new words 
and the coinage of idiomatic expressions. It resulted from a period of growth in 
the lexicon triggered by this important event in Polish history. The political and 
economic changes brought about led to the creation of many new phenomena 
which had to be named. Consequently, many neologisms entered the language 
at that time. The adoption of the new system required new terms for institutions, 
organizations, positions, forms of activity etc. 

Another way of expanding the Polish lexicon was borrowing. After the trans- 
formation of 1989, contact with other countries was much more intensive: the 
Polish began to travel freely and many foreign companies started to operate on 
the Polish market. As in other languages, English, the modern lingua franca, was 


4 Galasinski (1992: 58) emphasizes that there is a contradiction between a person's wish to pre- 
sent himself/herself at his/her best and others' negative evaluation of all such efforts. The exist- 
ence of this contradiction results in developing means which allow a person to boast and at the 
same time to avoid punishment for doing so. 

5 According to Chlebda (2001: 159), two kinds of phraseological stock growth can be distin- 
guished, i.e. a constant growth and a periodic one. The former is related to the development of 
science, technology and culture, due to which there is a constant need to create new words and 
expressions. The latter is caused by a very important event such as the political and economic 
transformation of Poland in 1989 in this case, which results in greater nomination needs than in 
the case of constant growth (Szerszunowicz 2015). 
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the main donor language. Many words and idioms were borrowed during the 
post-transformation period. Loans were very common in those areas which were 
either non-existent or poorly developed in Communist times, such as computing, 
marketing and advertising. In the case of these three, mainly English terminology 
was adopted (Ozóg 2004: 237-238). 

From a broader perspective, the new post-transformation reality called for 
different means of communication from those used in the previous system (Marc- 
janik 2007; Szerszunowicz 2007). The changes concerned both the public sphere 
and the private one. In terms of public communication, Newspeak had disap- 
peared and new forms such as public debate, were created (Fras 2005). The lan- 
guage of politics changed considerably: after 1989, Walesa, the first Polish pres- 
ident in the post-transformation period, used informal language rich in figurative 
expressions, which was completely different from the style of communication of 
Communist dignitaries, and this had an influence on the language of politics. 

As for everyday conversation, the American style of communication had an 
impact on Polish language behaviour.^ For instance, American small talk influ- 
enced to some extent the way the Polish communicate (Szerszunowicz 2007: 41- 
47; Grybosiowa 2003). In everyday informal and semi-formal language contact, 
more exponents of a positive attitude are attested than before, which to a great 
extent is related to speakers' self-image creation efforts (Szerszunowicz 2007). In- 
formality has become a desirable quality (Ozóg 2004: 237), which is reflected, in- 
ter alia, in interviews on television and radio programs in which people who have 
never met before often prefer to use their first names — a custom not typical of 
Polish culture (Marcjanik 2007). This behavior can be classified as a violation of 
Polish norms resulting from the adoption of a foreign model of communication 
(Dabrowska 2001: 188; Grybosiowa 2003; Marcjanik 2007). 


6 The West, comprising the western European countries and the USA, is perceived as attractive 
from the Polish perspective. It can be assumed that the American everyday routines shown in 
films influenced the style and manner of communication of the young generation. Moreover, 
since the transformation, foreign travel has become common, which has also contributed to the 
adoption of certain new communicative behaviors (Ozóg 2004; Skowronski 2007). 
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3 An Integrated Approach to Pragmatic Idioms: 
A Case Study of New Polish Pragmatic Idioms 


Pragmatic idioms are important tools in the process of communication and their 
main function is to constitute speech acts, thus revealing “aspects of culture- 
based social interaction" (Piirainen 2008: 215). Since their nature is complex, a 
multiaspectual analysis is required to determine their properties. The proposed 
approach integrating different aspects permits an in-depth analysis of the com- 
plexity and specificity of pragmatic idioms. The study will encompass not only 
linguistic aspects, but also those related to extralinguistic issues. The presenta- 
tion of the proposed model will be followed by three case studies of Polish prag- 
matic idioms. All the units came into use after the transformation of 1989 and are 
used in the spoken variety of Polish. 

Pragmatic idioms are language units which exhibit various different features. 
Analyzing the idiomatic expressions in question involves a comprehensive study 
of all aspects related to their linguistic characteristics and properties, status and 
functions in a given language. For instance, apart from lexico-grammatical char- 
acteristics, the stylistic value of pragmatic idioms should be analyzed, since prag- 
matic idioms and their variants may have different markedness. A comprehen- 
sive approach ensures a true and fair picture of a particular unit, allowing the full 
set of its properties to be revealed. 

Furthermore, the cultural aspect is of importance, as will be attested by the 
examples to be analyzed. In the case of the Polish post-transformational reality, 
the new culturally-conditioned situations generated many pragmatic idioms 
(Grybosiowa 2003; Marcjanik 2007; Szerszunowicz 2015). To fully understand 
them, the inclusion of the cultural background is necessary. For example, the 
changes in Poles' perception of the new reality (e.g. success as a well-deserved 
award, the awareness of positive self-presentation) caused by the influence of the 
English-speaking world, especially the USA and Australia, resulted in the coin- 
age of new pragmatic idioms, suited to the post-transformation conditions 
(Grybosiowa 2003: 180). Cultural changes influence communication strategies 


7 The American influence is observed in many spheres of life, not only communication 
(Marcjanik 2007; Ozóg 2004; Szerszunowicz 2007), but also in others, such as, inter alia, culinary 
culture (Skowronski 2007). As observed by Skowronski (2007: 362), it may be hard to determine 
the border between the phenomenon of Americanization and that of globalization, yet, it can be 
assumed that from the Polish post-transformational perspective it is the USA which sets the 
standards for what is desired. The rule that what is American is better than our own results in 
the ease of adoption of American models of communication, behavior etc. Skowronski (2007: 
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and styles, which gives rise to new units - in the case of the Polish language, the 
exponents of positive thinking (e.g. Bedzie dobrze lit. It will be good, Damy rade 
lit. We will make it). 

A comprehensive analysis may include other elements: the quality and quan- 
tity of parameters depend on the unit itself. An interdisciplinary approach is 
taken and includes such disciplines as marketing. For instance, after the trans- 
formation, many shops began to introduce their own marketing policy, which 
comprised a standardized way of addressing clients. In one of the Polish chain 
stores, PPS Spolem, a specially designed set of pragmatemes was launched: for 
example, when the client has paid, while giving him/her the receipt, the shop 
assistant always says: Dziekujemy, zapraszamy! (lit. Thank you [and] we invite 
[you]). Analysis of such units involves the inclusion of elements of marketing and 
customer loyalty studies. 

In the case of pragmatic idioms, a multiaspectual analysis is necessary to de- 
termine their properties and functions. The analysis should comprise various fac- 
tors to ensure that the specific character of a given unit will be detected. Consid- 
ering the cultural background in a broad sense may be of importance, too. The 
adoption of the integrated approach results in a comprehensive description of the 
analyzed units. 

The proposed approach will be implemented in the case studies of selected 
Polish pragmatic idioms which came into use after the transformation of 1989. 
The units chosen for the analysis reflect the cultural changes related to the new 
post-transformational political and economic situation. Analyses of three expres- 
sions will be conducted to show how this approach can be applied in practice in 
the study on recent units. 


3.1 Miłego dnia! (lit. ‘Have a nice day!) 


Pragmatic idioms should also be viewed from the point of view of genre studies. 
Such expressions have an act-constituting potential, thus it is of importance to 
determine in which genre they tend to be used. Fixed multiword units of prag- 
matic character appear in various genres, both spoken and written. The genre 
perspective relates to discourse analysis, which should also form part of studies 
of pragmatic idioms. They are used in various forms of discourse, for instance, 


363) draws attention to the fact that in Poland, the statement It is like that in the USA may func- 
tion as an argument in a discussion. It means that the solution followed by this comment is the 
best possible one. 
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live speeches, demotivators etc. Furthermore, due to the act-constituting poten- 
tial of the units in question, situational aspects should be included (Filatkina 
2007: 137-138). For instance, it has to be determined who usually uses a given 
expression when talking to whom, in which situations, and which functions the 
units may perform. A corpus of the spoken variety is useful in such analyses, of- 
fering many examples of use of given units, as shown by Inoue (2007). 

The first pragmatic idiom chosen for analysis belongs to the category “greet- 
ings and farewells”. Goffman (1971: 79) calls such expressions “access rituals” 
and states that "greetings mark the transition to a condition of increased access 
and farewell to a state of decreased access" (Goffman 1971: 47). The expression 
Milego dnia! (lit. Have a nice day!) belongs to the latter group, since it is a leave- 
taking formula. 

The unit Milego dnia! started to be used in Polish after 1989: the phrase was 
introduced by American corporations which brought their own communication 
model with many ready-made expressions (Marcjanik 2007: 53). In fact, in Eng- 
lish, this expression is commonly used and by no means limited to one sphere of 
communication - it is universal and may occur in various contexts (Grybosiowa 
2003: 182). The English pragmatic idiom Have a nice day!, calqued into Polish as 
Milego dnia!, was introduced at staff training sessions (Marcjanik 2007: 53). In 
such companies, using this expression was a way of finishing the conversation 
with the customer recommended by the firm (Marcjanik 2007: 53). The function 
of the unit is to warm up relations between persons who either know each other 
on a client-staff basis or have never met before. 

Initially, the idiom aroused doubts among Polish normative linguists, who 
perceived it as "foreign", unrelated to Polish culture and not compliant with 
Polish standards of politeness (Grybosiowa 2003: 182; Marcjanik 2007: 53-54). It 
was viewed as too informal and as interfering in the interlocutor's private life, in 
some way imposing a good mood on the addressee. Another aspect which might 
have raised objections is the fact that in Polish informal discourse more sincerity 
is expected.? For instance, answers to the Polish question Jak sie masz? |lit. How 


8 In fact, as observed by Marcjanik (2007: 53), wishes are part of Polish farewell formulae, for 
instance, wishing somebody a good journey. However, wishing the interlocutor a good day is 
not rooted in the Polish tradition. In the past, as observed by Grybosiowa (2003: 182), there were 
some farewell formulas of performative character, like Zostan z Bogiem (‘Remain with God’) or 
Bądż zdrów (‘May you be healthy"), which are no longer in use. 

9 The reactions to the question Jak sie masz? (‘How are you?’) may vary in Polish, depending on 
the interlocutor's mood (Grybosiowa 2003: 178). However, over recent decades, the responses 
have been shifted towards the creation of a more positive self-image on the part of the speaker 
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are you?] may vary, including complaining and expressing negative feelings 
(Wojciszke and Baryta 2001: 45-64). Thus, taking into consideration the cultural 
differences in this respect, in many communicative situations in which unrelated 
persons are involved, Mitego dnia! may sound unnatural. 

Marcjanik (2007: 54) observes that the use of the expression is limited to the 
sphere of services and trade: Mitego dnia! is a phrase used by shop assistants, 
bank clerks and hairdressers. The clients may respond using the same idiom 
(Mitego dnia! ‘Have a nice day!’), replying Wzajemnie/Nawzajem (‘The same to 
you’) or by thanking the employee (Dziekuje ‘Thank you’). Since it is limited to 
services and trade, according to Marcjanik (2007: 54), its use is not recommended 
in other spheres of life. 

Irrespective of normativists’ opinions, the borrowing gradually began to be 
used outside the branches of American corporations - for instance, in everyday 
informal conversations (Grybosiowa 2003: 182).? In one of Tomasz Jastrun's nov- 
els, a character describes the phrase after using it at the end of his utterance in 
the following way: *Zycze milego dnia (ten zwrot upowszechnia sie w Polsce i 
pewnie zrobi taka karierę jak słowo dokładnie na przełomie lat 80. i 90)". (lit. ‘I 
wish a nice day' (the expression is becoming widespread in Poland and is bound 
to make a career like the word exactly at the turn of the 80s and 90s)) (NKJP)". 
This statement does justice to the status of the phrase in question in the spoken 
variety of Polish. 

As for the linguistic properties of a formula, the expression has to be analyzed 
from the point of view of its fixedness and variance: Does it appear in other forms? 
If it does, which constituents are substituted and which factors condition the al- 
terations - maybe it is a pattern which freely generates realizations. The canoni- 
cal form has to be determined, which in some cases may be problematic, espe- 
cially due to the units being used in spoken language. The problems may be 
related not only to the wording itself, but also to the graphic form." 


(Jakubowska 1999: 58), which is also confirmed by a study of the small talk genre from a Polish- 
English perspective (Szerszunowicz 2007). 

10 In fact, the unit is not mentioned in the monograph on cross-cultural dimensions of polite- 
ness in Polish and English (Jakubowska 1999). 

11 A detailed presentation of the Polish national corpus (NKJP) is given in a monograph availa- 
ble online (Przepiórkowski et al. 2012). 

12 Forinstance, in 2005, Poradnia jezykowa PWN, an online language advisor, received a ques- 
tion regarding the graphic form of a common informal greeting siema with alternative spellings 
(sie ma/sie ma)". The consulting linguist (Grzenia 2005) answered: “najlepiej napisać siema, jest 
to zreszta juz powszechnie stosowany zapis" (lit. ‘the best spelling is siema, this spelling is 
common’). In the case of some units, phonetic features, such as intonation, are also important. 
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In modern Polish, a phraseological pattern has developed in which the noun 
dzien ‘day’ can be substituted with another noun. In fact, the phrase Milego dnia! 
generated the model [Życzę] X miłego Y (lit. ‘[I wish] X a nice Y]’ in which X is the 
addressee of the speaker's wishes, for instance Panu/Pani/Panstwu/ci/wam (lit. 
'Sir/Madam/Sir and ‘Madam/you’) etc., and Y is the period meant to be nice, like 
dzien ‘day’, popotudnie ‘afternoon’, weekend ‘weekend’ or tydzien ‘week’. 

To verify the status of the expressions in question, Narodowy Korpus Jezyka 
Polskiego (NKJP), was consulted. The corpus search was conducted for the ca- 
nonic unit Mitego dnia! and three derived forms, i.e. Mitego popotudnia!, Mitego 
wieczoru! and Mitego weekendu! with the PELCRA search engine. It shows a fairly 
high frequency with 142 occurrences of the unit Milego dnia! and several uses of 
the variant forms: Milego wieczoru! — 19 occurrences, Mitego weekendu! — 19, 
Mitego popotudnia! — 6. A WebCorp search allowing for analyzing the distribution 
of the usage of a given phrase in a wide spectrum of genres shows that the ex- 
pressions in question (Milego dnia!, Milego popotudnia!, Milego wieczoru!, Milego 
tygodnia!) also function in different contexts: for instance, these phrases are in- 
cluded as a category in an online catalogue of pictures accompanied by wishes 
for a nice day, afternoon, evening or week to be sent via email (Obrazki Online). 

The phrase Milego dnia! is used in various communicative contexts, ranging 
from clerk-client conversations to friendly chats. It is becoming a universal con- 
versation-ending formula. The widespread use of the unit Mitego dnia! reflects 
the changes occurring in the style of Polish conversation, and, from a broader 
perspective, in Poles’ communicative behavior. The English-speaking world's 
communication standards oblige the speaker to use exponents of positive atti- 
tude, encapsulated in the saying If you don’t have anything nice to say, don’t say 
anything. Many of the exponents of positive attitude and being nice are pragmatic 
idioms, such as How nice to see you, I'm fine, Well done! etc. To some extent, the 
increasing popularity of the analyzed phrase and its variants shows that this ten- 
dency is being adopted in modern Polish communicative culture. 


3.2 Ja panu nie przerywatem (lit. ‘I didn't interrupt you, sir’) 


After the transformation of 1989, the changes in the sphere of Polish politics were 
fundamental. The adoption of the new system resulted in the creation of different 


Moreover, extralinguistic aspects may be of importance, too: for example, certain pragmatic id- 
ioms tend to be used with a particular gesture, for instance, Touch wood! or I'll keep my fingers 
crossed! 
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communicative situations. The new genres, such as political debate, provided the 
opportunity to voice opinions freely in public. In the budding democracy, politi- 
cians had to work out their own ways of discussing issues in public. Since many 
political discussions were televised after the downfall of Communism, viewers 
were able to witness the development of the genre. 

Fras (2005: 97) draws attention to the fact that in Poland, there is no tradition 
of such public debate where participants presenting views unacceptable to others 
would be treated with due respect. Until 1989, the ability to participate in public 
discussion had been of little importance and consequently these skills had not 
developed. Under Communism, communication was not based on interaction — 
it was one-directional, with censorship and police restrictions. As a result, there 
is no model for participation in public debate. 

The post-transformational situation was a new one and there was a shortage 
of fully professional politicians skilled at public speaking and creating a positive 
self-image so that viewers were rather disappointed by the quality of the televised 
debates. The language used in the debates varied from informal, sometimes even 
aggressive and vulgar, to formal, official, with an inclination towards using spe- 
cialized terminology. Moreover, in the post-transformation period, after the New- 
speak period, public discourse was rich in colloquial expressions, in many cases 
even too informal, and thus rude (e.g. Pani jest $mieszna! (‘You are ridiculous"), 
Pani jest chyba chora! (‘[Maybe] You are ill); Grybosiowa 2003: 183-184). 

What immediately captured viewers’ attention was a phrase Ja panu nie 
przerywalem (lit. ‘I did not interrupt you, sir’), overused by many public speakers 
who participated in political discussions. The constituent panu 'sir' can be sub- 
stituted with pani ‘madam’ or the plural form comprising representatives of both 
sexes — panstwu 'sirs and madams’. 

The formula can be described as a single-sentence speech act performing two 
main functions: phatic and persuasive. In political discourse in the media, it is 
extremely frequent: occurring in almost all television or radio programs where 
the representatives of various parties meet (Zimny and Nowak 2009: 91). In order 
to create an emotional dispute, the host of the program stirs up a pseudo-conflict 
in which the phrase tends to be used frequently. 

As a result of having been used so frequently, this phrase has now become 
an element of political communicative ritual in Polish debates. Politicians tend 
to employ it in their utterances, since using the pragmatic idiom Ja panu nie 
przerywatem implies that the speaker is a well-behaved, tactful and collected per- 
son, who is able to contribute to the discussion in an appropriate way, open, co- 
operative and skilled at starting a dialogue and conducting it (Kampka 2009: 162). 
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This image is desired by politicians and using this kind of idiom is meant to con- 
tribute to conveying to the viewers a positive picture of the speaker. At the same 
time, the opponent to whom the expression is addressed is presented as one de- 
void of good manners, impolite and unable to participate in public debate. 

In fact, the idiom Ja panu nie przerywatem has already begun to be perceived 
as a kind of phatic gesture. Its status is attested by the presence of the unit in 
various cultural texts such as demotivators or cabaret shows. The phrase is used 
as an element of the satirical image of a politician (Zimny and Nowak 2009: 91), 
for example, in one of Tym’s editorials: 


Dzieci, które nie nauczą sie słuchać innych, zostają parlamentarzystami zapraszanymi do 
dyskusji telewizyjnych. I oto mamy czwórkę dzieci, każde z innym logo na Sliniaczku, które 
mówią jednocześnie “Ja panu nie przerywałem”, a pan Rymanowski próbuje im zadać 
pytanie. 
[lit. Children who have not learnt to listen to others become members of parliament invited 
to television discussions. And here we have four children, each with a different logo on his 
bib, who are all saying at the same time, “I didn’t interrupt you”, and Mr Rymanowski is 
trying to ask them a question.|? 

(NKJP) 


The phrase is underrepresented in the corpus with only 7 occurrences, which re- 
sults from the contents of the NKJP. Yet, after analyzing the results from the Web- 
Corp search, it can be concluded that the expression has become part of the lexi- 
con of the Polish language. The phrase is multifunctional: performing both phatic 
and persuasive functions. Occasionally, the expression is used in informal dis- 
course to introduce humor into the conversation or as an element of memes, 
which also confirms its well-established status in the modern Polish language. 


3.3 Byle do pigtku! (lit. ‘Only till Friday") 


Although the results of the 1989 transformation were most noticeable in the areas 
of politics and the economy, they also occurred in other spheres. One of them is 
everyday life. Multiword units of pragmatic character should also be viewed from 


13 The excerpt comes from the editorial titled Pustka (‘emptiness’) published in a Polish maga- 
zine Polityka. The author, Stanistaw Tym, is a Polish satirist who has also written film scripts. 
Bogdan Rymanowski is a Polish journalist and the host of a television program Kawa na lawe 
(the title is an idiomatic expression similar to the English idiom, Don't beat about the bush) 
shown on the TVN24 channel in which politicians representing different parties meet and dis- 
cuss important issues. 


New Pragmatic Idioms in Polish — 189 


a sociolinguistic perspective. Pragmatic idioms differ depending on the age 
group, whether they belong to a particular subculture and other factors. For in- 
stance, after the transformation of 1989, as the awareness of the importance of 
being fit increased, numerous fitness and bodybuilding centers sprang up: the 
people who went to them developed their own lexicon with idiomatic expres- 
sions, including pragmatic units like Nie napinaj sie, bo peknie lustro (lit. ‘Don’t 
try so hard, or the mirror will break’) - used to ironically suggest that somebody 
is trying too hard (Piekot 2000: 53). 

The transformation brought many changes, one of which was that Saturdays 
were work and school free so that Poles now had a longer weekend. After this 
change, as in most western countries, Friday was the last working day of the week 
in the vast majority of the companies. It meant that the transformation increased 
the number of free days, thus giving Polish people more leisure time. 

The new situation resulted in a change in perception of the free weekdays: 
the weekend began to gain in importance and it started to play an important role 
in the collective minds of Poles.“ Its status attained after 1989 is reflected in the 
Polish language of recent decades: new words and phrases have come into use, 
for example, od pigtku do pigtku (lit. from Friday to Friday). It is not surprising, 
since all kinds of weekend activities, socializing and going out are of great im- 
portance, in particular for the young. 

In youth jargon, the day preceding Friday, czwartek ‘Thursday’, is figura- 
tively named mały piątek ‘little Friday’, whereas the word piątek ‘Friday’ is used 
interchangeably with its diminutive form pigteczek ‘little piatek’, which can be 
classified as a term of endearment. Another phrase showing the attitude towards 
the weekend is the unit piątek, pigteczek, piqtunio (lit. ‘Friday, little Friday, very 
little Friday’), used to express the happiness resulting from the fact that it is al- 
ready Friday - a phrase similar to the TGIF — Thank God/goodness it's Friday 
(Peeters 2007: 94).” 

The phrase Byle to pigtku! (lit. ‘Only till Friday"), meaning ‘May we survive 
till Friday’ and suggesting ‘afterwards it will be all downhill’, is a variant form of 
phrases attested before the transformation in which seasons of the year appeared 


14 Peeters (2007) offers a study of the Australian perception of the weekend, presenting both 
linguistic and cultural data to show that it is one of the key words in Australian culture. 

15 A Google Graphics search produced a modification containing even more diminutive forms: 
Pigtek, piqteczek, piqtunio, pigtus, piqcieczek, piqtuniek — znajdziesz tysiące słów, by opisać to, co 
kochasz (lit. ‘Friday, then five diminutive forms derived from the word Friday followed by a com- 
ment: You will find thousands of words to describe what you love"). Unlike English, the Polish 
language is rich in diminutive suffixes, so that many diminutives of the word pigtek can be 
formed. 
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such as Byle do wiosny! (lit. ‘Only till spring") and Byle do lata! (lit. ‘Only till sum- 
mer!’).’° The expressions were used in spoken discourse to cheer the interlocutor 
up, inspire hope, bring encouragement, suggest improvement in the course of 
time etc. In certain communicative situations, they can be used as leave-taking 
formulae optimistic in character. 

The use of the name pigtek (‘Friday’) is significant. Viewed from a linguo- 
cultural perspective, it can be concluded that the presence of the constituent 
piątek corroborates the gradual development of Polish weekend culture. Friday 
afternoon is the start of the weekend and thus the day is the indicator of the off- 
work period. 

The Polish corpus contains as few as 5 occurrences, but the WebCorp search 
brought more information on the use of the unit in question. For example, like 
the pragmatic idiom Mitego dnia!, the phrase accompanies pictures to be sent 
electronically — for instance, one of them shows Bugs Bunny working in a quarry 
(Jeja). Numerous examples of its use were attested in the Google Graphics search. 
As shown by the analysis of the corpus search and the analysis of the findings 
retrieved by means of WebCorp, the pragmatic idiom Byle do pigtku! tends to be 
used in the spoken variety of Polish as well as in internet communication such as 
blogs, chats, demotivators etc. 


4 Conclusions 


The transformation of 1989 changed Poland in many respects: a new political sys- 
tem was adopted and the economy was replaced with another model, so society 
was now adapting to a transformed reality, rich in new phenomena, processes 
and situations. This state of affairs forced language users to deal with the post- 
transformation environment and develop appropriate communicative strategies. 
It involved finding ways of functioning linguistically in the new situations: either 
borrowing, for instance, from English, modifying Polish language units or coin- 
ing new ones. 

In many of the situations, language users needed pragmatic idioms, since 
the old means of expression were now inappropriate or inadequate. After 1989, 


16 The frame Byle do X, in which X is the moment after which life should be much easier may 
be modified. An example of such a modification is observed in a demotivator which contains the 
following string of realizations: Byle do 15... Byle do pigtku... Byle do wyplaty... I tak jeszcze przez 
30 lat (lit. ‘Only till 3 p.m.... Only till Friday... Only till pay day... And like this for the next 30 
years’). 
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many idioms were coined as a result of the period of growth following the trans- 
formation. Moreover, the constant growth also contributed to an increase in the 
stock of pragmatic idioms. A multiaspectual analysis of selected examples shows 
that the study of such units involving various parameters offers an insight into 
their complex character. 

First of all, pragmatic idioms reveal a considerable amount of cultural infor- 
mation, i.e. facts regarding post-transformation Poland. The first of the idioms 
analyzed confirms the influence of the English-speaking world on the Polish com- 
municative style. It reflects the changing language behaviors in the new reality, 
in which being pleasant and friendly gains a new dimension. From the linguistic 
perspective, it can be concluded that the canonical form borrowed from English, 
Milego dnia!, is easily modified and it can be seen that a pattern based on it now 
has a place in the Polish language. 

The origins of the second unit can be traced back to the problems of budding 
Polish democracy in which public debate still leaves a lot to be desired. A multi- 
functional phrase Ja panu nie przerywatem began to be used not only in the lan- 
guage of politics but also in other varieties. Found in various cultural texts, 
memes or press articles, it developed a connotative potential which is also im- 
portant from the communicative perspective. 

The third expression, Byle do pigtku!, which in fact exploited a pattern known 
before 1989, illustrates the changes in social perception of the weekend. It is in- 
dicative of the increasing importance of having free time at the end of the week: 
Friday is the day one is looking forward to since it starts the weekend period. The 
frequent use of the expression as a text accompanying visual material which can 
be sent to an addressee is indicative of new ways of cheering somebody up. 

In conclusion, it should be emphasized that as important tools of communi- 
cation, pragmatic idioms should be analyzed from various perspectives. A multi- 
aspectual analysis of pragmatemes reveals their properties and their potential, 
which is important from both a theoretical perspective and a practical one. As for 
the former, such analyses contribute to a better understanding of formulaic lan- 
guage, while in terms of the latter, the research studies in question may improve 
lexicographic descriptions of pragmatic idioms and the quality of their presenta- 
tion in the process of language teaching. 
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Mareike Keller 
Compositionality: Evidence from Code- 
Switching 


Abstract: The storage and processing of phrasemes has been discussed many 
times over the past decades, with varying results. Researchers still disagree as to 
the degree to which phrasemes are stored and processed holistically or composi- 
tionally. This paper approaches the topic of compositionality through bilingual 
data, which is rarely discussed in theoretical work on phraseology. It provides a 
qualitative analysis of verb-based phrasemes, highlighting the structural and se- 
mantic features of code-switching patterns in and around phrasemes which serve 
as Clues to underlying production processes. The study is based on recordings of 
German-English informal conversation. The language-mixing patterns are pre- 
sented in the framework of the MLF model (Myers-Scotton 2002; Myers-Scotton 
and Jake 2017). The mixing patterns inside collocations and the resistance to mix- 
ing of more idiomatic phrasemes suggest that the surface realization of phra- 
semes in bilingual speech is determined both by morphosyntactic code-switching 
constraints and by the semantic impact of nominal and verbal phraseme compo- 
nents on the meaning of the phraseme as a whole. The findings support both the 
Superlemma Theory of phraseme processing (Sprenger et al. 2006) and the MLF 
model of code-switching, as they provide empirical evidence for the unitary stor- 
age of phrasemes at the conceptual level as well as for their compositional assem- 
bly in accordance with structural code-switching constraints during language 
production. 


1 Introduction 


One of the much-discussed but still unresolved questions related to multi-word 
sequences like idioms, semi-idioms and collocations (henceforth referred to as 
phrasemes) concerns the way they are stored in the mental lexicon: Are they 
stored and retrieved holistically or are they assembled compositionally from in- 
dividual words each time they are produced? A traditional approach to composi- 
tionality is the investigation of variation and modification in monolingual canon- 
ical data (Moon 1998; Langlotz 2006: 175-224). Further insights have been drawn 
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from the analysis of non-canonical data like language acquisition, aphasia, attri- 
tion, or slips of the tongue (Hacki-Buhofer 2007; Paradis 2004; Kuiper et al. 2007). 
More recently, psycholinguistic experiments have been conducted measuring 
processing speed, mostly in comprehension, but also in production (Havrila 
2009; Wray 2012). In this paper the subject of compositionality is approached via 
a largely unexplored type of data: phrasemes in naturally occurring code-switch- 
ing produced by balanced bilinguals.' The approach builds on the assumption 
that language switching or mixing alongside or within phrasemes can be em- 
ployed as an indicator for chunking or parsing during language processing (Ba- 
ckus 2003; Wray and Namba 2003; Namba 2012). As differences between mono- 
lingual and bilingual language processing concern areas like speed of access or 
executive control rather than basic processing mechanisms (Paradis 2004; Bia- 
lystok and Craik 2010), the conclusions are not restricted to bilingual contexts but 
could also provide explanations for monolingual storage and processing of com- 
plex lexical items. 

Phrasemes are a highly heterogeneous group of lexicalized word-strings and 
the information a phraseme can reveal with respect to language processing de- 
pends on the lexical category of its syntactic head as well as on its internal syn- 
tactic structure. This paper is devoted exclusively to phrasemes in the form of 
syntactic constituents with a verb as syntactic head.’ Verb-based phrasemes were 
chosen because their comparatively complex argument structure provides more 
opportunities for internal language mixing than e.g. nominal phrasemes. All ex- 
amples were extracted manually from a 50-hour corpus of German-English spon- 
taneous speech.? The paper provides empirical evidence of the mixing patterns in 
and around phrasemes and explores the ways in which language contact phe- 
nomena can be related to syntactic and semantic properties of phrasemes. The 
findings provide clues to the mental representation of phrasemes, including the 


1 Balanced bilingualism is defined here as a native-like level of proficiency in both languages. 
2 The phraseological terminology used in this paper is based primarily on Burger (2015). The 
term phraseme will be used as a cover term for idioms, semi-idioms and collocations. Verb-based 
phrasemes, which are the focus of the study, fall into Burger's category of referential phrasemes 
in the form of syntactic constituents (nominative referentielle Phraseme, Burger 2015: 32). 

3 The data were collected between 1999 and 2005 as part of the project “Sprachkontakt Deutsch- 
Englisch: Code-switching, Crossover & Co." funded by the Deutsche Forschungsgesellschaft 
(DFG) and headed by Rosemarie Tracy (University of Mannheim) and Elsa Lattey (University of 
Tübingen). Further details on the speakers and data collection process are given in Tracy and 
Lattey (2010). My sincere thanks go to Rosemarie Tracy for access to the recordings and the tran- 
scripts. 
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level at which language selection takes place and will hopefully inspire further 
research into a complex but highly promising type of data. 

The structure of the paper is as follows. As most readers will be more familiar 
with phraseology than code-switching, basic assumptions concerning storage 
and processing of phrasemes are outlined only very briefly (section 2), before the 
structural approach to code-switching is introduced in more detail (section 3). 
Then the empirical data are presented and analysed (section 4). In section 5 the 
findings are discussed with respect to theoretical issues concerning the composi- 
tionality of phrasemes. Section 6 concludes the paper with a short summary and 
suggestions for future research. 


2 Conceptual Unity - Compositional Processing 


The observation that different types of phraseme exhibit different degrees of fix- 
edness or compositionality has been widely discussed among phraseologists, 
and over the past decades various taxonomic approaches placing different types 
of phraseme along a continuum have been proposed and refuted (Wray and Per- 
kins 2000). The one characteristic uniting all types of phraseme, from true idioms 
to collocations, seems to be that they are recurrently co-occurrent sequences of 
lexemes which appear to be reproduced rather than creatively assembled. Some 
of them express meaning beyond the sum of the meaning of their individual com- 
ponents, some are peculiar in their syntactic make-up — but the vast majority do 
not show any semantic or syntactic characteristics which clearly set them apart 
from free combinations of words. How, then, can we tell that one string of words 
is a phraseme and another one is not? One indispensable precondition for recog- 
nizing phrasemes as such in actual discourse seems to be their representation as 
conceptual units at some level in the mental lexicon (Backus 2003: 92). However, 
unitary representation does not necessarily entail holistic storage and processing 
all the way from the conceptual level to actual phonological realization. The 
question that remains is: Which aspects or components of a phraseme are stored 
in long-term memory, and what can be assembled online during production (see 
Jackendoff 2002, 152-195)? 

Itis widely assumed that phrasemes have their own entries in the mental lex- 
icon (Levelt 1989: 186-187; de Bot 1992: 10), but there is no agreement on what 
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this entry actually looks like.* To describe the representation of a phraseme in the 
mental lexicon, Levelt and Meyer (2000: 442) introduce the term superlemma, 
which “represents the idiom’s restricted syntax and points to a set of simple lem- 
mas.” This idea is expanded by Sprenger, Levelt and Kempen (2006) into their 
Superlemma Theory. The theory supports a hybrid view of phraseme processing 
(Cutting and Bock 1997) and claims that “[flixed expressions and idioms and lit- 
eral language only differ with respect to the source of word activation: while the 
words of a literal phrase are activated by their own lexical concepts, the words of 
a fixed expression will benefit from a common idiom node” (Sprenger et al. 2006: 
167). This means that in a phraseme, the individual lexemes are selected from the 
lexicon via the superlemma entry for the phraseme. The Superlemma Theory is 
attractive because it treats the production of phrasemes similarly to the produc- 
tion of free combinations, and it elegantly aligns production and comprehension. 
In addition, the contradiction between conceptual unity on the one hand and 
syntactic compositionality on the other is resolved by postulating the superlem- 
ma as a conceptual unit and the component lexemes as syntactically related but 
individually accessed pieces. 

As long as we are dealing with monolingual data, we can use e.g. speed of 
access in experimental settings, or performance errors in spontaneous and elic- 
ited speech as indicators of chunking or parsing of phraseological units. When 
we look at bilingual data, the contrast between the two languages involved offers 
an additional clue to the way in which conceptual units are assembled into actual 
phonetic strings. One might assume that a string of words which appears as a unit 
on the level of conceptual representation should be barred from internal lan- 
guage mixing in order to preserve the exact meaning or pragmatic function of the 
unit. The relevance of phrasemes in contrast to simplex lexemes for the study of 
code-switching patterns was already noticed in a very early study by Hasselmo 
(1970: 196), who writes about the data he analysed: “Purely lexical conditioning 
of switching is obviously an important factor, but throughout this discourse it 
appears that larger preformulated segments play a role as well.” Later code- 
switching research has mentioned in passing that phrasemes are often inserted 
as whole constituents (e.g. Myers-Scotton 2006: 263), supporting the view that 
phrasemes are processed as units all the way from the mental lexicon/phrasicon 
to the phonetic level. However, this blanket view does not hold for all types of 


4 One problem with previous research on the topic is the definition of the target structures. Ear- 
lier works focus mainly on pure idioms, or idioms in a narrow sense. The authors cited in the 
following paragraphs may not all have had phrasemes in the wider sense in mind, but their find- 
ings are applicable nevertheless. 
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phraseme. Backus (2003) cites examples of phraseme-internal language mixing, 
which suggest that under specific conditions phrasemes can be broken up into 
their sub-components at some point in the production pipeline.’ Furthermore, it 
shows that not all phraseme components are language-specific on all levels of 
language production. I believe that the propensity or resistance of a phraseme to 
internal language mixing can be used as a clue to how phrasemes are assembled 
during language production. The following section provides a short introduction 
to the structural study of code-switching. This will serve as the theoretical back- 
ground against which the behaviour of phrasemes in code-switching is analysed. 


3 Bilingual Code-Switching 


Over the past decades, code-switching has been studied from various angles, 
such as sociolinguistics, psycholinguistics or syntax, with the goal of finding out 
which factors influence or constrain mixed utterances. Contrary to early beliefs, 
mixing languages is neither a sign of incompetence, nor does it occur randomly. 
Instead, it seems to be governed by social as well as syntactic constraints, the 
nature of which has not been fully understood. What we can say for sure is that 
code-switching constraints, just like any other grammar rule, are probabilistic ra- 
ther than absolute.? This paper focuses on the morphosyntactic aspect of lan- 
guage mixing and attempts to link it to semantic factors influencing the surface 
form of complex lexical items which are the object of investigation of phraseologi- 
cal research. Myers-Scotton's Matrix Language Frame Model (MLF model) (1993 
and following) serves as the theoretical framework for the account of bilingual 
phraseme processing developed in the following pages. For reasons of space the 
account must remain somewhat superficial. Readers new to the topic are referred 
to Myers-Scotton and Jake (2009) for a concise but detailed overview. 

The MLF model is cognitively based and lexically driven, which means it is 
focused on processes originating in the mental lexicon. It was devised in accord- 
ance with basic assumptions of generative grammar and aims to explain how lan- 
guage production is linked to linguistic competence (Myers-Scotton 2002: 14). At 


5 Namba (2012) also deals with the topic of mixed phrasemes in code-switching. However, as 
his analysis is based on bilingual acquisition data from two young children, his examples are 
very few and cover only a small section of frequent phraseological structures. 

6 See Mindt (2002: 210-211) who argues that any descriptive grammatical rule will have about 
596 exceptions due to online processing errors, idiosyncrasies or variation/language change. 
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the core of the MLF model lies the claim that the distribution of languages in bi- 
lingual clauses is asymmetrical. One language, the matrix language (ML), pro- 
vides the morphosyntactic frame of a bilingual clause.’ Into this ML frame, ele- 
ments from a second language, called the embedded language (EL), can be 
inserted. The MLF model’s unit of reference is the bilingual clause (Myers-Scotton 
and Jake 2017: 3).? This means that the two principles restricting the surface real- 
ization of morphemes in code-switching are only applicable to bilingual clauses. 
They are not aimed at syntactic units bigger than one clause. Also, the terms ML 
and EL only refer to one clause at a time. According to the MLF model, the surface 
realization of morphemes? in a bilingual clause is constrained by two principles. 
These two principles state that in mixed-language constituents, word order and 
particular grammatical morphemes (e.g. morphemes transporting information 
on agreement or case) have to come from the ML: 


The Morpheme-Order Principle: In ML+EL constituents consisting of singly occurring EL 
lexemes and any number of ML morphemes, surface morpheme order (reflecting surface 
syntactic relations) will be that of the ML. 

(Myers-Scotton 1997: 83) 


The System Morpheme Principle: In ML+EL constituents, all system morphemes which have 
grammatical relations external to their head constituent (i.e. which participate in the sen- 
tence's thematic role grid) will come from the ML. 

(Myers-Scotton 1997: 83) 


The MLF model has been revised several times in order to make its predictions 
more precise. One crucial step in clarifying which morphemes are affected by the 
System Morpheme Principle was the introduction of the 4-M-Model (Myers-Scot- 
ton and Jake 2000). Myers-Scotton and Jake (2017: 2) state explicitly that the 4-M- 
Model is not itself a model of code-switching but a general model of morpheme 
processing, applicable equally well to other types of data. It relates to the MLF 


7 This assumption is made only about so-called classic code-switching *in which empirical ev- 
idence shows that abstract grammatical structure within a clause comes from only one of the 
participating languages" (Myers-Scotton and Jake 2009: 337). For mixed languages, Myers-Scot- 
ton (2002: 100) proposes a composite matrix as the grammatical basis. The term ML should not 
be mistaken for or confused with the dominant language of a speaker or a discourse. It is a gram- 
matical abstraction, applicable only within one clause (Myers-Scotton 2002: 58). 

8 In terms of generative syntax: "Our unit of analysis is the clause, or CP, the projection of com- 
plementizer, or COMP" (Myers-Scotton and Jake 2015: 418). 

9 The term morpheme is used for surface realizations (phonetic form in the actual utterance) as 
well as for the underlying lemma entry (abstract form in the speaker's mind) (Myers-Scotton 
2002: 106). 
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model only insofar as it can help to explain the morphosyntactic regularities ob- 
served in bilingual clauses. The model assumes four different types of morpheme 
(hence the name, 4-M[orpheme|-Model): 1. content morphemes, 2. early system 
morphemes (e.g. plural affixes), 3. bridges (e.g. possessive markers) and 4. out- 
siders (e.g. case and agreement markers). Content and early system morphemes 
together transport the meaning of an utterance. They are accessed at the concep- 
tual level. The two types of late system morpheme, bridges and outsiders, make 
the utterance grammatical in terms of the morphosyntactic structure projected by 
the matrix language. Their exact phonological form is selected only at the level 
of the formulator, once the thematic grid of the utterance has been laid out.” In 
short, “[c]lontent morphemes and early SMs satisfy the speaker's intentions, 
while late SMs provide grammatical structure" (Myers-Scotton and Jake 2017: 3). 
According to the System Morpheme Principle, only the late outsider system mor- 
phemes must be supplied by the ML in bilingual constituents. Their function in a 
clause lies in *disambiguating grammatical roles and providing argument struc- 


10 Explanatory note: The 4-M-Model assumes an asymmetry between content and system mor- 
phemes, which is crucial for language processing. In crude terms, content morphemes are con- 
ceptually activated lexical items (which assign or receive theta-roles; Myers-Scotton and Jake 
2015: 425), whereas system morphemes are structurally assigned functional elements. The sys- 
tem morphemes are subdivided into early and late. Early system morphemes are accessed along 
with content morphemes from the mental lexicon. They are functional affixes which add to the 
semantic content but do not affect the grammaticality of the sentence. The sentences Paul likes 
Anna's sister and Paul likes Anna's sisters are equally grammatical, but the plural affix on the 
word sister in the second one changes the meaning of the proposition. The late system mor- 
phemes are subdivided again, into bridges and outsiders. Both help to make the sentence gram- 
matical. A bridge establishes a grammatical relation between lexical items within the systactic 
constituent in which it occurs. In Paul likes Anna's sister the possessive marker expresses the 
grammatical relation between Anna and the sister, which in the given sentence are both compo- 
nents of the same object NP. An outsider establishes a grammatical relation with a lexical item 
outside the systactic constituent in which it occurs. In Paul likes Anna's sister the agreement 
marker on the verb expresses the grammatical relation between the subject NP and the verb un- 
der INFL (Myers-Scotton and Jake 2017: 7). A particular grammatical morpheme is not necessarily 
assigned to the same group crosslinguistically (Myers-Scotton and Jake 2017: 4). Its type depends 
on the kind of grammatical information the morpheme carries. In Modern English, the article 
only carries information about definiteness and is classified as an early system morpheme. In 
Modern German the article also carries information about case and is thus a late outsider system 
morpheme. 

11 The processing components referred to by Myers-Scotton are based on Levelt's model of lan- 
guage processing (Levelt 1989; Levelt et al. 1999). The model was adapted to bilingual speech by 
de Bot (1992) and Wei (2009); see also Myers-Scotton (2005). 
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ture” (Myers-Scotton and Jake 2017: 7). The intuition that different types of mor- 
pheme are accessed at different levels during language production will prove cru- 
cial for understanding which types of phraseme, or which phraseme compo- 
nents, are uttered in which language in code-switching discourse. 

The MLF model does not make any predictions about the processing of 
phrasemes. Nevertheless, the following comment shows how the model relates 
to the question of phraseme storage and processing: 


I also see my work as recognizing that explanations lie in linking a theory of language with 
a theory of language processing in a manner similar to the views expressed in Jackendoff 
(2002). Jackendoff stresses the need to consider what aspects of an utterance are in long- 
term memory (content morphemes in my framework) and what aspects can be constructed 
online with working memory. 

(Myers-Scotton 2002: 310) 


Myers-Scotton has not published any work focusing on phrasemes specifically 
but in her discussion of so-called EL-islands (full syntactic constituents from the 
EL inserted into an ML clause) she mentions that these islands often show phra- 
seological characteristics: 


Many of the Embedded Language islands can be considered collocations, combinations of 
words that often appear together as a single phrase. 
(Myers-Scotton 2006: 263) 


[M]any Embedded Language islands are either formulaic or routine collocations, perhaps 
making them similar to the activation required to access singly occurring forms. 
(Myers-Scotton 2002: 162) 


These comments suggest that phrasemes are likely to be inserted as chunks of 
lexemes from only one language into bilingual utterances. At first glance my data 
seem to confirm this. Most phrasemes are inserted as EL chunks and do not show 
internal mixing at all, among them the vast majority of adverbial and nominal 
phrasemes: 


(1) KL: Not anymore. And {in} einer Hinsicht it’s-äh I think it is is it’s more hygienic. 
[K6:730]” 


12 Most of the examples used in this paper are taken from the database compiled for the re- 
search project presented in Keller (2014). Transcription conventions: “In order to ensure reada- 
bility of examples, we added punctuation marks and adopted the following conventions: Ger- 
man items are roman, English are italic; a slash signals a word- or sentence-break, a dash con- 
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(2) AS: ...jetzt is des Tor net zugangen. Und all of a sudden hat der g’schrien ja geh halt 
rei’, du Depp! 
[AIMI1b:722| 


(3) LK: Weil wenn ma’, wenn ma’ sieben Jahr’ lang nicht nicht redet/ first of all, damals 
war's no’ net so wie heut', dass du... 
[L2:401] 


(4) KL: That keeps me going. I’m pretty sure. And der gute Wille. And that's about it. 
[K9:467] 


(5 KL: Things were then better over here, too, you know? Naja. But der liebe Gott, h-he-he 
evened it out. 
[K16:58] 


(6) TG:...da ham se bloß das Essen gekriegt und ’n- place to stay! 
[T29:949] 


However, especially among verb-based phrasemes there is a significant number 
of items that do show internal language mixing: 


(7 TG: ...because- he made himself {so} wichtig, you know. 
[T29:278] 


Here, the German phraseme sich wichtig machen (Engl. act the big shot, literally 
‘make oneself important’) is rendered partly in English and partly in German. Ex- 
amples like (7) suggest that at least verb-based phrasemes are not necessarily ac- 
cessed as completely prefabricated, language-specific strings of lexemes. Maybe 
they are accessed as strings of lemmas, or as superlemmas, at the conceptual lev- 
el - but somewhere along the production process the superlemma must be de- 
composed and reassembled drawing on lexemes/morphemes from two different 
languages. This raises the question of which elements of a mixed phraseme ap- 
pear in which language in a bilingual utterance. Or more precisely: Which ele- 
ments of a mixed phraseme are realized in the language the phraseme is drawn 
from and which elements are translated, or calqued? 


nects iterated items. Curly brackets mark ambiguous language affiliation. Round brackets indi- 
cate incomprehensible sections, square brackets set off meta-linguistic comments and indicate 
passages left out; [...] Note that we consistently — even within English utterances — employed 
German orthography for hesitation expressions, i.e. äh(m) (Tracy and Lattey 2010: 57). I added 
curly brackets to mark homophonous diamorphs, i.e. elements which could be English or Ger- 
man. Square brackets following examples contain file and line identification. 
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In the pursuit of possible constraints regulating the language distribution in 
the surface realization of phraseme components, Backus (2003: 92) suggests that 
“ML morphemes will have semantically basic meanings.” Unfortunately, Backus 
(2003) leaves it to the reader to decide what does and what does not qualify as 
semantically basic meaning. Unrelated to the topic of phraseological units, Wei 
(2009: 280-283) regards lemma congruence, i.e. the degree of similarity between 
word forms from different languages expressing the same lemma, as the organi- 
zational principle guiding the production of mixed utterances. For him the main 
reason for inserting EL content morphemes into an ML frame seems to be insuffi- 
cient semantic or pragmatic congruence between lemmas. Likewise, Myers-Scot- 
ton (2002: 20) suggests that “lack of sufficient congruence may explain why cer- 
tain structures are avoided or impossible in switching between specific language 
pairs." However, what is sufficient and what is insufficient congruence remains 
unclear. Nevertheless, studying code-switching data might shed more light on 
the question of which elements are central and which peripheral in lexical en- 
tries, simplex or complex ones: 


[H]ow an EL content morpheme is accommodated by an ML frame tells us something about 
which features characterizing that morpheme (ultimately characterizing its supporting 
lemma) are critical and which may be peripheral in lexical entries. At this stage, we only 
aim to have shown the effects on CS [=codeswitching] of different aspects of lexical struc- 
ture, but we do think it is clear how studying congruence in CS has implications far beyond 
the nature of CS itself. 

(Myers-Scotton and Jake 1995: 1019) 


This is to say that the study of morphosyntactic details in code-switching data 
and its implications is more than just a source for understanding more about the 
possible compositionality of phrasemes. It holds valuable clues to the make-up 
of entries in the mental lexicon. With this theoretical introduction in mind, the 
goal of the study presented in the following section is to show how balanced bi- 
linguals integrate verb-based phrasemes in their everyday conversations. 


4 A Study of Verb-Based Phrasemes in German- 
English Code-Switching 
The examples presented in this paper are based on 732 utterances containing var- 


ious types of phraseme of the size of a syntactic constituent, extracted manually 
from 50 hours of informal interviews with seven German Americans (see footnote 
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3). Six of the interviewees emigrated to the US as adults, one at the age of four- 
teen. At the time of recording they were 65-87 years of age and had lived in an 
English-speaking environment for 42-66 years. After their emigration from Ger- 
many some of the speakers continued to use their variety of German on a regular 
basis, others experienced phases with no or hardly any interaction with other na- 
tive speakers of German. 

The close typological relatedness of English and German, which might pose 
an obstacle to some areas of linguistic research, is a definite advantage for an 
investigation of mixing patterns targeting phraseological material, because the 
high number of cognates and (near-)homophones along with the large overlap 
on the morphosyntactic level provokes a variety of mixing phenomena less likely 
to be found in bilingual data based on typologically more distant languages. 

Phrasemes are notoriously hard to define, and the decision as to whether or 
not a combination of words is phraseological or not is always to a certain degree 
a subjective one (see Howarth 1998: 29). I cannot guarantee that I did not miss 
items that another phraseologist would have wanted to include. To confer a cer- 
tain degree of objectivity, I included only phrasemes listed in major printed and 
online dictionaries of idioms and collocations. 

Out of a total of 732 utterances containing phrasemes in the form of a syntac- 
tic constituent (verb-based and other), 146 (i.e. about 20%) exhibit obvious traces 
of the speaker’s bilingualism, either in the form of code-switching in the vicinity 
of the phraseme or as phraseme-internal language mixing (table 1). 


Tab. 1: The frequency of mixing vs. switching (N=146) 


mixing switching 
verb-based phrasemes 59 75% 18 27% 
other 20 25% 49 73% 


total 79 100% 67 100% 


My argumentation builds on the hypothesis that language mixing inside a 
phraseme is suggestive of a compositional process. Phraseme-internal language 
mixing can be observed primarily inside verb-based phrasemes. Therefore, the 
present paper focuses on verb-based phrasemes (N=451) and refers to other syn- 
tactic types of phraseme only for comparative reasons. Very early on during the 
research process it became clear that the semantic impact of the verb itself ap- 
pears to be a crucial factor in determining the mixing patterns in utterances con- 
taining verb-based phrasemes. Consequently, the target utterances were divided 
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into two groups. The first group consists of 236 utterances, each containing a 
phraseme headed by a verb that adds a clearly discernible semantic component 
to the overall meaning of the phraseme (Example: live in the lap of luxury). These 
phrasemes will be referred to as VPhr. The second group consists of 215 utter- 
ances, each containing a phraseme headed by a light verb. These phrasemes will 
be referred to as vPhr. In a vPhr, the semantic core of the phraseme is carried by 
the nominal component (Example: be sorry).? The verb does not add clearly dis- 
cernible meaning to the overall meaning of the utterance but rather serves the 
syntactic function of turning the expression into a predicate (Pottelberge 2007; 
see also Allerton 2001; Butt 2003, 2010; Winhart 2005). For the present paper I 
included the verbs be, have, make, get from English and sein, haben, machen from 
German as heads of light-verb phrasemes. The choice is undoubtedly arbitrary, 
and more verbs could be included in this group. 


4.1 Phrasemes with a Semantically Salient Verbal Head 


All seven informants produce phrasemes with a semantically salient verbal head 
(VPhr) in both their languages with equal ease and there are hardly any cases of 
transfer or interlanguage forms of the kind found in contexts of foreign language 
acquisition. In monolingual English utterances the speakers use idiomatic VPhr 
that do not have a word-for-word translation (8) as well as idioms which can be 
expressed using the same image in German (9, Sterne sehen). In monolingual Ger- 
man utterances the speakers use a wide variety of standard and dialect idioms, 
some of which they may not have encountered anymore at all after settling in the 
United States (10 and 11). This shows that all speakers have a well-developed ac- 
tive repertoire of idiomatic expressions in both their languages. 


13 The German tradition uses the term Funktionsverbgefüge mostly for combinations of light 
verb + noun. As I could not find a difference in mixing behavior between light verb + noun and 
light verb + adjective combinations, I have decided to treat them as one group, focusing on the 
semantic lightness of the verb instead of on the syntactic category of the nominal complement. 
I have also included more complex combinations like be close with s.o., containing a light verb, 
an adjective, a preposition and an external valency slot. 

14 Thelight verb constructions discussed in this paper should not be confused with the dummy 
verb constructions frequently mentioned in works on language contact (Myers-Scotton and Jake 
2015: 428; González-Vilbazo and Lopez 2011). Light verb constructions are lexicalized phraseo- 
logical units listed in monolingual dictionaries. Dummy verb constructions are a type of contact 
phenomenon where a light verb is used to integrate foreign lexical material from one language 
into another. 
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(8) KL: Pm- you know, I’m keeping my fingers crossed. 


[K3:16] 
(9) KL:Iwalked right into that door, and fast, because I was in a hurry. I saw stars. 
[K22:523] 
(10) TG: En Abstecher hier und da und/ den Rahm überall abschópfen, ne? 
[T20:954] 


(11) TG: San aa die die, wo die arme Leit alle/ ois abnehme und dann leben wie Gott in 
Frankr-Frankreich. 
[T1:890] 


In addition to phrasemes in a monolingual context the speakers produce various 
forms of overt and covert language mixing in and around VPhr. One form of cov- 
ert language mixing is spontaneous or idiosyncratic calquing, where a phraseme 
(mostly a collocation rather than a true idiom) is rendered as a word-by-word 
translation. 


(12) TG: Wenn ma nach California g'flogen san, des hat ja aa lang g’numme. 
[T16:1144] 


In (12) the Bavarian German hat lang g’numma is a calque of the English colloca- 
tion take long. Spontaneous calques are unidiomatic in monolingual standard us- 
age and are not listed in idiomatic dictionaries.” The calques that are produced 
by the speakers are limited to a few recurring items which seem to have become 
established within the speaker community. Apart from those few established 
calques, the speakers seem to notice their own spontaneous calques and make 
an effort to repair them: 


(13) TG: Because, for the children's sake you have to bring a- a little sa/ you have to sacrifice 
something. 
[Tel:1121] 


In (13) the speaker first begins to translate the German phraseme ein Opfer bringen 
(lit. bring a sacrifice). The attempt is abandoned, and the speaker starts the sen- 
tence over, using the simplex verb sacrifice, thus achieving a non-phraseological 
but native-like wording. 


15 Traditionally the term calque refers to lexicalized items (English skyscraper > German 
Wolkenkratzer). The spontaneous word-for-word translations described here are mostly idio- 
lectal nonce formations. 
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Especially if a VPhr has no translation equivalent, we could assume that it 
would most likely be embedded as a whole into a clause from another language. 
In her code-switching studies Myers-Scotton refers to the insertion of a full EL 
constituent into an ML frame as an EL island. She assumes that phrasemes are 
frequent triggers for EL islands (2002: 157, 162 and 263). From studies of lexical 
borrowing we know that noun phrases and adverbials are borrowed quite easily. 
The data confirm this kind of insertion for phrasemes with a noun head or in the 
function of an adverbial (see examples (1)-(6) above). The borrowing of verbs is 
more complex, as it usually requires the borrowed item to be adapted to the mor- 
phosyntactic requirements of the recipient language (tense, word-order, etc.). 
With simplex verbs, borrowing along with morphosyntactic adaptation is still 
quite common (to google s.th. « etw. googeln « Er hat etwas gegoogelt). Yet, when 
a verb can express its meaning only in combination with at least one lexically 
predetermined argument, insertion in the form of an EL island does not occur. 
What we do find is a number of code-switches which in all likelihood are antici- 
pational and triggered by a VPhr (14-15): 


(14) TG: Aber ich bin froh. They keep an eye on her, too. Wenn wie/ irgendwie was war, 
die würden ihr helfen. 
[T28:22] 


(15) KL: ...but the situation in Osoppo, I think that really/ that- des is ma sehr nahe ge- 
gangen, I mean I couldn't understand anybody wanting to live like that. 
[K1:164] 


In each case the language is switched not only for the VPhr but for the entire 
clause. In (14) the switch-point coincides with the beginning of a new independ- 
ent clause. Planning and production difficulties are obvious in (15), where the 
anaphoric subject that of the switched clause is first uttered in the ML, repeated 
in the ML and then uttered in the EL as des. The decision as to whether a language 
switch was triggered by a phraseme or was due to other factors is undoubtedly 
subjective. Reasonable cues are hesitation’, self-correction, hedges or metalin- 
guistic comments, and maybe also the lack of a translation equivalent. 


16 Code-switching per se is not concomitant with an increase in hesitation phenomena com- 
pared to monolingual speech (Ehinger 2003). However, in my corpus phrasemes in bilingual ut- 
terances show significantly more hesitation than those in monolingual utterances. This is par- 
ticularly noticeable around verb-based phrasemes (bilingual utterances: VPhr 5696 and vPhr 
41%; monolingual utterances: VPhr 18% and vPhr 16%). This suggests significantly higher pro- 
duction costs. 
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The corpus contains a handful of VPhr where the phraseme as a conceptual 
unit is clearly attributable to language A but some components of it are realized 
through words or morphemes from language B. The result is a form of overt lan- 
guage mixing which, for lack of an established term, we will for now refer to as 
partial calque. This is rare and produced only by the speaker TG who, compared 
to other members of her German-American social group, is most at ease with mix- 
ing her languages: 


(16) TG: Und dann war's f- für Freudenmädchen. Sin’ se fin} line gestanden! Die Soldaten, 
die Fl- die die Flotte, die amerikanische, war im Hafen. 
[T6:284] 


(17) TG: Und mein Vater, der ging mal zur Bank in {New York} und hat sich [i-in}-äh line 
ge-gestanden, to- get to the teller... 
[T29:259] 


In both (16) and (17) the underlying phraseme seems to be the English stand in 
line. The verb is realized in German, the perfect tense is selected in accordance 
with German colloquial norm. The nominal component, line, appears in English. 
It is preceded by the preposition in, which in German-English language mixing 
cannot be assigned to either of the two languages. Such elements are referred to 
as homophonous diamorphs (following Clyne 1967) and are often found at switch 
points. 

The last example in this section is a rare and curious form of covert language 
mixing which we can call bilingual contamination. Contamination is a well-docu- 
mented phenomenon affecting phrasemes in monolingual contexts where two 
phrasemes are merged into one (Cutting and Bock 1997; Burger 2015: 26). In our 
case, one of the phrasemes comes from English, the other one from German: 


(18) KL: Ah, des is nett, well, dann gibst ihr viele Grüsse. 
[K8:51] 


In (18) the verb from the German phraseme jmdm. viele Grüße sagen is replaced 
byatranslation of the English give, which is most probably a transfer of the verbal 
component from the English phraseme give s.o.'s love to s.o. The surface lexicali- 
zation is entirely monolingual. What makes this example interesting in the given 
context is that just as in the overtly mixed examples it is the verbal component 
which is calqued. 

So far, we have established that the speakers have a well-developed reper- 
toire of VPhr in both their languages. They use them in monolingual as well as in 
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bilingual turns. If during a turn a speaker wants to use a phraseme from the lan- 
guage which is currently not the ML, he or she switches the language, possibly in 
anticipation of the phraseme, for the entire clause. The use of VPhr in bilingual 
clauses in the form of overt mixing is rare and often accompanied by hesitations 
and repairs. In the following section we will look at verbal phrasemes with a se- 
mantically light verb. These show more overt phraseme-internal mixing and thus 
provide more interesting evidence with respect to the question of compositional 
processing. 


4.2 Phrasemes with a Semantically Light Verbal Head 


In this section we zoom in on verb-based phrasemes with be, have, make, get from 
English and sein, haben, machen from German as their syntactic head. Some of 
the phenomena and findings described in the section on VPhr are also applicable 
to vPhr. The speakers use them with equal ease in both their languages, in mon- 
olingual as well as in bilingual turns. Most vPhr occur in monolingual clauses: 


(19) LK: Un’ na sag i, well, i wollt’- Mittag mit dir mache heut, un’ i hab Zeit. 
[L2.518] 


(20) KL: And afterwards she was sorry that she didn’t buy it. 
[K9.118] 


As with VPhr, insertions limited to a vPhr alone do not occur. However, in con- 
trast to VPhr, anticipational switching is not frequent either. There are a few 
switches following abandoned calques of more idiomatic vPhr: 


(21) KL: Na, aber die war nicht/ She was not what we call here my cup of tea. 
[K16:38] 


In (21) the entire clause is repaired and also the phraseme is flagged as an item 
specific to American culture by the meta-comment what we call here. Although 
there are no obvious complete calques, there are also a few cases of attempted 
calquing, abandoned mid-sentence. In these cases, it is not the complete clause 
that is started over; rather, the repair is limited to the nominal component of the 
vPhr: 


(22) TG: Is’ die Elsie Eigel noch in gut/ {in} good shape, Elsa? 
[T28:200] 
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In (22) the English be in good shape’ is first translated but abandoned at the point 
where the speaker would have to assign a German gender-specific adjective end- 
ing to gut. The repair begins with the homophonous diamorph in. In the repaired 
version the verb still remains in the ML, whereas the complete NP is inserted in 
the EL. 

Overt phraseme-internal language mixing in the form of partial calques is 
found with significant frequency (17% of 215 targets) and across speakers (6 out 
of 7 speakers): 


(23) LK: ...und deshalb si/ bin i ja {so} close mit denen. 
[L2:454] 


(24) TG: Hatt’s ihn grad’ runterschlagen kónnen, because- he made himself {so} wichtig, 
you know! 
[T29:278] 


In (23) the verb of the English vPhr be close with s.o. is calqued, as is the preposi- 
tion with. The semantically most salient component, the adjective close remains 
in its original English form. Before the English insertion we have the intensifier 
so as a homophonous diamorph. In (24) the verb of the German vPhr sich wichtig 
machen is calqued, as is the reflexive pronoun sich, whereas the adjective wichtig 
remains in its original language. Again, the homophonous diamorph so appears 
between the calqued components and the EL insertion. 

All mixed verbal phrasemes appear to follow one consistent mixing pattern: 
the verb is calqued and the semantic core (mostly a noun or an adjective) appears 
in the original language of the phraseme. The mixing pattern is not dependent on 
the language of the phraseme, English or German. The partial calque in (25) will 
now be discussed in more detail in order to relate this recurrent pattern to the 
theoretical assumptions about code-switching and language processing outlined 
in sections 2 and 3. 


(25) LK: ...wie mer unser/ uns die Hauser angschaut ham, da wollte mer sure mache, dass 
mer e Haus kriege, wo mer e Eckbank neistelle kann. 
[L1:42] 


The underlying phraseme appears to be the English collocation make sure. A pos- 
sible German translation equivalent is sichergehen (literally: go sure). Thus, a 
conflict on the level of lexical congruence could be expected with respect to the 


17 Whether or not to include the copula verb in the phraseme is a complex issue which for rea- 
sons of space is not addressed in this paper (see Fix 1971: 72 and Keller 2014: 195-198). 
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semantically non-congruent verb rather than the congruent adjective. However, 
this is the reverse of what we actually see happening: the speaker chooses to 
calque the semantically incongruent verb make as German machen and to leave 
the semantically congruent adjective sure in its original form. This suggests that 
in partial calques superficial lexical equivalence is not the primary force at work. 
So, what exactly is motivating lexical selection during the production of mixed 
vPhr? 

If there were no morpho-syntactic constraints governing the production of 
mixed utterances, one could imagine the following alternative renderings of the 
phraseme make sure in a bilingual clause with German as the ML (note: for ease 
of explication the dialect from the original is adapted to standard German): 


(a Da wollten wir make sure, dass... 
(b) Da wollten wir sure make, dass... 

(c) Da wollten wir sicher make, dass... 
(d) Da wollten wir sure make-en, dass... 
(e Da wollten wir sure machen, dass... 


The MLF model provides arguments for why versions (a)-(c) should be dispre- 
ferred by a balanced bilingual. The complete EL insertion of the phraseme as in 
the hypothetical realization given in (a) violates the morpheme order principle, 
which states that word order must come from the ML. According to the rules of 
German word-order, the non-finite verb make should be preceded by the adjec- 
tive sure. The EL insertion in (b) fixes this problem and follows ML word order. 
However, it still violates the system morpheme principle: the non-finite EL verb 
make doesn't carry the ML infinitive suffix -en.^ The same holds for the mixed 
option in (c), which calques only the semantically congruent adjective and re- 
tains the original but non-congruent light verb. Option (d) is in line with both 


18 Myers-Scotton and Jake (2017: 10) refer to French infinitive suffixes as early SMs, based on 
the observation that in their data French infinitives appear to be inserted along with their French 
infinitive suffixes. This does not seem to be so for the inserted German and English infinitives in 
the corpus I used. There are instances where the German infinitive suffix is omitted, e.g. in “Na, 
let's fahr-o nach England, wegen deine Geschwister und die alle” (Keller 2014: 219). Conversely, 
when an English infinitive is adapted to German, an infinitive ending is added, e.g. “Zwei lan- 
guages zusammen-put-en!" (Münch and Stolberg 2005: 74). Therefore, I am inclined to assume 
that the German infinitive suffix is a late outsider, which — as all other outsiders — conveys gram- 
matical rather than semantic information. 
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MLF principles (word-order and outsiders from the ML) but includes word-inter- 
nal language mixing.” The combination in (e), i.e. the one actually produced by 
the speaker, is the one that optimally solves or integrates congruence issues on 
the morphosyntactic as well as on the semantic level: word-order and the infini- 
tive marker on the light verb come from the ML (German), satisfying the MLF con- 
straints. The adjective, which carries the semantically salient core of the phra- 
seme, is retained in its original language and inserted as an EL element into the 
clause. It does not carry any late outsider system morphemes and occurs in a po- 
sition which does not violate ML syntax. 


5 Discussion 


The examples provided in section 4 show that the code-switching constraints pro- 
posed by Myers-Scotton in her MLF model also hold for phraseological units. So, 
the study of phrasemes in code-switching lends further support to the model. 
However, as the subject of this paper is the processing of phrasemes rather than 
the predictive power of a code-switching model, the crucial question is: what can 
the behaviour of phrasemes in code-switching tell us about the internal make-up 
and processing of phrasemes? 

Mixed vPhr all show the same distribution of languages: The (light) verb is 
calqued and produced in the ML of the clause. The nominal component is in- 
serted in its original language. ? The order of the elements follows the syntactic 
requirements of the ML. This pattern integrates two challenges in an optimal way. 
First, retaining the nominal element carrying the semantic weight of the phra- 
seme in its original language serves as a cue for the language-specific multi-word 
sequence stored in the mental lexicon and helps to convey the intended proposi- 
tional content to the hearer. Second, calquing of the semantically light verb al- 
lows integration of a phraseme from language A into a clausal frame from lan- 
guage Bin a manner that does not violate the grammatical rules of language B as 


19 Word-internal mixing resulting from the addition of a language-B system morpheme to a lan- 
guage-A content morpheme is commonly observed among early bilinguals during simultaneous 
acquisition (Lanza 1997). The adult speakers who participated in our study seem to avoid word- 
internal mixing and use it mainly to achieve a comic effect. 

20 This distribution of languages matches findings presented by Marian (2009: 172), who, with- 
out reference to phrasemes, writes that in her data verbs tend towards covert mixing (calquing), 
whereas nouns are more often overtly inserted. She attributes this to the stronger syntactic rela- 
tions of verbs with other syntactic constituents in a clause. 
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proposed in the MLF model. The pattern is repeatedly produced by six out of the 
seven speakers and is thus not an idiosyncratic feature. The mixing pattern leads 
to the following hypothesis concerning the roles of semantics and syntax in the 
production of mixed phrasemes in classic code-switching: 


The lexeme carrying the semantic core of an EL phraseme needs to be pro- 
duced in its original language as a cue to the language-specific superlemma 
stored in the mental lexicon. Semantically lightweight elements can be 
calqued in order to satisfy ML morphosyntactic requirements. 


This hypothesis is an empirically derived synthesis of Myers-Scotton's (2002: 240) 
assumption that the primary function of an EL is to supply content morphemes 
in mixed constituents and Backus's (2003: 92 and 123) claim that in mixed con- 
stituents based on conceptual units ML elements will have semantically basic 
meanings. It also supports the claim that “basic vocabulary” tends to be calqued 
whereas "specific vocabulary" will be inserted as an EL form (Backus and Dor- 
leijn 2009: 92). 

With respect to language processing, the question now is: How do we get 
from a language-specific superlemma entry to a mixed phonological realization? 
With no explicit reference to phrasemes, De Bot proposes the following - fairly 
vague - suggestion concerning language-sensitivity or -specificity of the levels of 
speech production: 


[The conceptualizer] is probably partly language-specific and partly language-independ- 
ent. Further it is hypothesized that there are different formulators for each language, while 
there is one lexicon where elements from different languages are stored together. The out- 
put of the formulator is sent to the articulator, which makes use of a large set of non-lan- 
guage specific speech motor plans. 

(De Bot 1992: 1) 


Also without reference to phrasemes, Myers-Scotton and Jake (1995: 987) suggest 
that at the conceptual level, language-specific lemmas are selected and sent to 
one of the language-specific formulators, which then adds the required predi- 
cate-argument structure, word-order and inflections. 

Let us assume that the initial” step is the same for lemmas and superlemmas: 
Guided by the intent of the speaker, a language-specific superlemma is selected 


21 We can avoid the unresolved question of relative timing of the sub-processes if we adopt 
Jackendoff's Parallel Architecture model, according to which lexical/semantic and morphosyn- 
tactic processes run in parallel and influence each other (Jackendoff 1998: 39). 
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at the conceptual level. According to Sprenger et al. (2006: 167), each component 
of the superlemma is accessed individually from the mental lexicon, but through 
one common idiom node. This complex of individual but connected lemmas is 
sent to one of the language-specific formulators. As we want to explain the lan- 
guage distribution in mixed EL phrasemes, we are interested in the case where 
an EL phraseme is sent to an ML formulator.” The formulator is supposed to pro- 
ject ML argument structure onto the elements it receives from the conceptualizer 
and to convert lemmas into lexemes and then word forms which can be sent on 
to the articulator. Under the current assumption that bilinguals might have two 
separate grammars but only one joint lexicon, it might not be all that surprising 
that a lemma from one language could be realized by a word-form from the other 
language, even if this lemma is part of a phraseme. However, the choice of surface 
language does not appear to be random. Judging from the mixing patterns we 
find in the data, the choice of word-forms at the formulator level appears to be 
subject to two constraints, one conceptual-semantic and one morphosyntactic in 
nature (Tab. 2). 


Tab. 2: Assigning surface language to phraseme components 


Conceptualizer A superlemma representing a complete language-specific phraseme 
is selected from the mental lexicon. 


Formulator Morphosyntactic constraint Conceptual-semantic constraint 
(Myers-Scotton’s SMP) 
Semantically salient phraseme 


Phraseme components which components must come from the 
host late outsider system mor- language with which the 
phemes activated only at the phraseme is affiliated in mono- 
level of the formulator must lingual speech 


come from the ML 


The morphosyntactic constraint, Myers-Scotton’s System Morpheme Principle, 
holds for classic code-switching in general. The semantic constraint is specifi- 
cally formulated for phrasemes in code-switching. The two constraints, applica- 
ble in parallel rather than consecutively, offer a theoretical explanation for the 


22 If an ML phraseme is sent to the ML formulator, we will get a monolingual utterance. If an EL 
phraseme is sent to the EL formulator, the result will be an EL island. An ML phraseme sent to 
an EL formulator would not be an option, as it renders the basic idea of having an ML completely 
mute. 
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recurring overt language mixing pattern in vPhr. Also, they can help to explain 
the apparent resistance of VPhr to internal mixing. When an EL vPhr is inserted 
into an ML clause, the semantically salient element (mostly a noun or an adjec- 
tive) must be realized in the original language of the phraseme in order to satisfy 
the semantic constraint. The semantically light verb, which in the actual phonetic 
string carries a late outsider, can be adapted to ML morphosyntactic require- 
ments by way of calquing (or “literal translation") to satisfy the syntactic con- 
straint. However, if a speaker wants to make use of a VPhr which is not part of the 
language he or she is currently using as the ML, the verb is semantically salient 
and thus needs to be realized in the original language of the phraseme. But it also 
carries a late outsider. Without word-internal mixing the verb cannot be adapted 
to ML morphosyntactic well-formedness conditions. Therefore, the only solution 
appears to be anticipational switching of the entire clause. The observed re- 
sistance of VPhr to internal mixing might suggest that more idiomatic phrasemes 
are processed holistically. I don't think this is the case. Rather, the *grammar" of 
classic code-switching (outsiders have to be supplied by the ML) prevents overt 
mixing of idiomatic VPhr.” The few instances where the speakers start with the 
production of a mixed or calqued VPhr are quite instructive: The observation that 
these attempts are often abandoned and rephrased indicate that the speakers are 
aware of the *unlawfulness" of such translations of phrasemes. And it suggests 
that the bilingual language monitor checks for idiomaticity not necessarily before 
but rather while assembling a phraseological unit from individual language-spe- 
cific lexemes. The abandoned calques show that more idiomatic elements of a 
phraseme can be calqued individually as well but that the result is rejected by the 
language monitor. 


6 Conclusion 


Of 451 verb-based phrasemes analysed for the present study, 2096 show overt or 
covert language contact phenomena, either inside the phraseme (language mix- 
ing) or in its direct vicinity (language switching). The analysis has shown that 
phrasemes are subject to the same morphosyntactic constraints as free combina- 
tions of words proposed in the MLF model and the 4-M-Model (Myers-Scotton 


23 If we look at cases of attrited (or attriting) phrasemes, which for reasons of space have been 
left out of the discussion, we can observe that in cases where an automatized production route 
is no longer available, VPhr also appear to be assembled from individual components (Keller 
2014: 251-253). 
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2002; Myers-Scotton and Jake 2017): word order and late outsider system mor- 
phemes, i.e. the inflectional morphemes, which only serve a grammatical func- 
tion, have to be supplied by the ML of the clause. Consequently, the verb, which 
in German and English carries a late outsider, has to be realized in the ML, at least 
in speaker groups where word-internal mixing is dispreferred. In contrast to the 
verb, nominal or adjectival complements can appear in the EL. This distribution 
of languages based on word-class is reflected in a recurring mixing pattern which 
is mostly found with inserted EL phrasemes containing a light verb but is also 
occasionally observable in more idiomatic phrasemes: the noun or adjective 
which carries the semantic core of the phraseme is realized in the EL, whereas the 
verb is calqued and produced in the ML of the clause. 

The observation that phrasemes in code-switching can be composed of ele- 
ments from different languages also supports the Superlemma Theory (Sprenger 
et al. 2006), which claims that the components of a phraseme are accessed indi- 
vidually, but through one common idiom node at the conceptual level. The find- 
ings suggest that at the level of the formulator, the production of phrasemes is 
determined not only by morphosyntactic code-switching constraints but also by 
phraseme-specific semantic considerations: The semantic core of a phraseme 
must be produced in the original language of the phraseme, while functional el- 
ements, including light verbs, can also be realized in a different language. 

A promising next step to test the theoretical modelling of language distribu- 
tion or language assignment to surface lexemes in mixed phrasemes proposed in 
this paper would be an extended analysis of more utterances with semantically 
light verbs as their syntactic head. But of course, there is a lot more to explore in 
the context of phrasemes and code-switching, for example the status of the cop- 
ula verb (included in or excluded from the phraseme) or de-automatisation as 
observable in attrition of phrasemes. Also, the influence of internal and external 
valency or of semantic compositionality could be analysed in more detail in order 
to further enhance our understanding of storage and processing of phrasemes. 
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Part IV: Earlier/Historical Stages of Language 
Development 


Marie-Luis Merten 
Insights into a Changing Communal 
Constructicon 


Legal Writing in the Late Middle Ages and Early Modern Period 


Abstract: The paper examines legal writing in the Late Middle Ages and Early 
Modern Period from a diachronic perspective. The underlying corpus consists of 
Middle Low German law codifications of the period from 1227 until 1567. Applying 
a constructionist approach, the focus lies on evolving and changing construc- 
tions (of legal writing). The corpus-based examination reveals insights into the 
changing communal constructicon. This communal constructicon can be seen as 
a repertoire of constructions shared by legal writers (of that time). Due to observ- 
able language elaboration processes, this repertoire — modelled as a socio-cogni- 
tive network — becomes increasingly complex and literate over time. Language 
elaboration is a type of language change closely linked to written usage. In this 
context, the obvious nexus between legal writing and language elaboration plays 
a crucial role. 


1 Introduction 


From a diachronic point of view, the paper aims to discuss (vernacular) legal writ- 
ing in the Late Middle Ages and Early Modern Period. Thereby, focusing on e- 
merging and evolving form-meaning pairs — constructions in the sense of Croft 
(2001) - as literate entities. From the perspective of corpus analysis, form-mean- 
ing pairs can appear as formulaic patterns in different corpora. Literate construc- 
tions emerge via processes of language elaboration (Maas 2008: 333). This type of 
language change is closely linked to writing and contexts surrounding the pro- 
duction of written documents. Especially legal texts — e. g. urban law codifica- 
tions — can reveal interesting insights into this phenomenon. Within the period 
of investigation (1200-1600), legal texts have to meet growing requirements: 
They need to be as explicit and unambiguous as possible (Hiltunen 2012), but 
they must also construe an increasing number of varying legal situations and cir- 
cumstances in a schematic and often compacted way (Tophinke 2009: 175—176, 
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2012). Subsequently, numerous literate form-meaning pairs coping with these de- 
mands evolve. Obvious examples are prepositional constructions (propositional 
integration) or different types of complex sentences (linking propositions to one 
another). Here, constructional changes and constructionalizations can be seen. 
Moreover, they serve as markers for a changing communal constructicon, a socio- 
cognitive network of form-meaning pairs shared by legal writers of that period. 
Since the focus of research is on Middle Low German legislative documents, the 
paper not only deals with new data but also takes into account a much neglected 
historical language (Hundt and Lasch 2015: 3). Around the year 1600 - before the 
written language shift towards Early New High German was completed -, this 
historical stage of contemporary and nowadays (mostly) only spoken Low Ger- 
man had been highly elaborated. Following Maas (2009: 170), Middle Low Ger- 
man in those days was "fit to replace Latin in all linguistic domains". In the Early 
Modern Period, Middle Low German was a far-reaching and wide-spread written 
language: It functioned as the lingua franca of the Hanse area. 

To capture the historical-diachronic dimension of elaboration processes, the 
corpus underlying this study consists of 13 urban law codifications from between 
1227 and 1567. Overall, it includes 244,140 words, whereby the shortest legal text 
(statutes of Werl of 1324) consists of only 1,400 words, the longest one (the urban 
law of Cleve from 1424/40) of about 70,000 words (for further infomation about 
the corpus cf. Merten 2018: 286-289). The qualitative research design includes 
the following steps and sub-objectives: (1) exploring the texts in their structural 
character and specific functionality (to regulate urban life), (2) identifying form- 
meaning pairs (in their function in legal texts) and (3) tracing their development 
over time (constructional change). Although the study is primarily qualitative in 
nature, frequencies of occurrence and changes of frequency can serve as im- 
portant indicators: They indicate which recurrent patterns are most likely to have 
constructional status and where/when changes have presumably taken place. 
Beyond that, identified constructions and formulaic patterns - e. g. multiword 
expressions as construction evoking elements - are investigated regarding their 
social dimension. Here, it is relevant to consider an evolving legal style (Schwyter 
1998: 190; Coupland 2007), for example, providing an explanation for retaining 
complex constructions that remain unchanged (e. g. theme indicating construc- 
tions). 

The structure of this paper is as follows: Section 2 introduces language elab- 
oration as a fundamental process of language change brought about by written 
usage. Cultural, cognitive and structural aspects of textualization phenomena are 
discussed and, in so doing, the language-historical setting is recapitulated. Sec- 
tion 3 provides an overview of key aspects of (diachronic) Construction Grammar 
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research. The focus lies on (1) processes of constructionalization and construc- 
tional change and (2) the emergence and nature of communal constructica. In 
this context, pragmatic associations between certain linguistic entities — con- 
structions — and typical usage events play a crucial role. Section 4 offers a closer 
look at the evolvement of Middle Low German legal writing (constructions) dur- 
ing the period of investigation. Selected examples are presented and discussed: 
firstly, restrictive constructions to construe exceptions to previous legal norms 
(section 4.1), and, secondly, theme indicating constructions (section 4.2). Section 
5 briefly summarizes main aspects of the paper and gives an over- and preview of 
ongoing and further work (DFG-funded research project InterGramm). 


2 Cultural, Cognitive and Structural 
Textualization: Language Elaboration and Legal 
Writing 


The Late Middle Ages and Early Modern Period are mainly characterized by a 
growing importance of (vernacular) writing and written documents. Written texts 
play an increasingly integral role in cultural memory (Assmann 1992: 52). As a 
pivotal form of mediation, written records preserve knowledge detached from 
contexts of its production. Not only are written texts essential for recording 
knowledge but also for its transmission and dissemination.! Law serves in this 
context as the domain par excellence: In the (Late) Middle Ages, urban life be- 
comes increasingly complex, social relations grow and more and more legal 
claims are made. At this point, law serves as an (establishing) institution to con- 
trol and regulate social situations and legal matters — e. g. how to behave as heir, 
seller, purchaser and so forth and what kind of punishment to expect when com- 
mitting different crimes. Consequently, the older Latin law (not tailored to urban 
needs) is replaced by vernacular legal writing (Deutsch 2013; Wallmeier 2013; 
Warnke 1999)? A rising number of urban communities starts to write down their 


1 See Ong (1986: 38) in this context: “Knowledge itself is not object-like: it cannot be transferred 
from one person to another physically even in oral communication, face-to-face, or a fortiori in 
writing. [...] Since knowledge cannot be physically transferred verbally from one human person 
to another but must always be created by the hearer or reader within his or her own conscious- 
ness, interpretation is always in play when one listens or when one reads." 

2 Maas (2009: 169) points out the following: “It took a long time to elaborate the vernacular 
languages so that they could articulate complex literate texts, and Latin served as the model: 
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own municipal law, using already existing codifications of other cities as model. 
This growing use of the vernacular language for formal occasions and functions 
marks the starting point for the gradual rise and conventionalization of literate 
form-meaning pairs — viewed as processes of constructionalization and construc- 
tional changes as they are discussed in the following section in greater detail. 

However, several further developments need to be pointed out as they have 
an impact on the constructional/structural dimension of legal writing, e. g. 
changing practices of reception: While the oldest Middle Low German law texts 
were meant to be read out to the public, the newer ones were meant to be read in 
silence to oneself (Erben 2000: 1585). This development from being read out to 
the urban community towards silent reading — as an adjustment of perspective — 
is accompanied by several structural developments (Szczepaniak 2015: 109). It 
becomes increasingly important to produce texts “that will be consistent and de- 
fensible when read by different people at different times in different places” 
(Chafe 1982: 45). Correspondingly, structures supporting the independent com- 
prehension of face-to-face contexts (have to) evolve (Maas 2009: 166): complex 
sentence types (e. g. subordinate constructions), modifying prepositional sche- 
mata, written text organizing constructions, several attributive techniques result- 
ing in complex noun phrases and so on: 


Written discourse develops more elaborate and fixed grammar than oral discourse does be- 
cause to provide meaning it is more dependent simply upon linguistic structure, since it 
lacks the normal full existential contexts which surround oral discourse and help determine 
meaning in oral discourse somewhat independently of grammar. 

(Ong 1982: 37) 


Moreover, legal writing undergoes a noticeable professionalization within the pe- 
riod of investigation: Legal writers become experienced practitioners sharing 
their established and further elaborating routines of producing legislative texts. 
Urban chancelleries turn progressively into institutions of professional writing. 
At the same time, the proportion of citizens able to read increases over the Late 
Middle Ages (Maas 2001: 85; von Polenz 2000). 

Consequently, numerous textualization phenomena at different levels start 
to emerge. The interwoven and corresponding dimensions are: the cultural, the 
cognitive-conceptual and the structural level (Schwyter 1998; Raible 1998). Struc- 
tural textualization relates to the language-internal dimension. It manifests itself 


Latin texts were ‘sparring partners’ for writers struggling to cope with these tasks. They had to 
calque Latin structures until a flexible literate grammar was also available in languages like Ger- 
man.” 
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in the shape of grammatical and lexical elaboration (Koch and Oesterreicher 
2007; Weber 2010), but is linked to the higher cognitive-conceptual textualiza- 
tion. Cognitive-conceptual textualization phenomena present themselves as (1) a 
“conceptual change of a whole discourse tradition from spoken to written” 
(Schwyter 1998: 190), but they also refer to (2) an increase in literate thinking 
(Raible 1998: 175). Writing makes a huge impact on thinking, it enables an inten- 
sive reflection on the text being produced, its planning and revision without com- 
municative pressure. On the whole, structural and cognitive-conceptual textual- 
ization are driven by but also function as driving forces for a superordinate 
cultural textualization as “a culture’s increasing use and acceptance of writing 
and literate modes” (Schwyter 1998: 190). As a consequence, literate societies 
(Goody 1986: 26) emerge, a form of community mainly based on and shaped by 
literacy (manuscript culture). 

In sum: As has already been stated at several points, language elaboration is 
closely linked to writing and is a continuous development. Already existing con- 
structions change and new literate form-meaning pairs emerge. From a grammat- 
ical point of view, constructionally complex schemata belonging to the formal 
register appear and evolve. The term literate refers to a linguistic coding in the 
form of sentences, the (functional) focus is on “addressing a generalized ‘other’, 
e. g. not presupposing a cooperative other for making sense of what is said, not 
relying on the situational context” (Maas 2009: 165—166).? When we look at the 
range of evolving construction types, we can distinguish at least three phenom- 
ena of grammatical language elaboration (Merten 2018: 273): (1) the genesis of 
complex sentences/subordination (syntactically complex constructions), (2) the 
evolvement of integrating constructions (propositional integration, text compres- 
sion) and (3) the gradual rise of constructions supporting the organization of writ- 
ten text (e. g. several coordinating constructions). 


3 Maas (2001: 94) differentiates between orate and literate structures as follows: *One dimen- 
sion of these style differences is explicitness or formality. At one extreme it is a strictly context- 
bound structure of utterances under control of face-to-face interaction, leaving most of what is 
said implicit. At the other extreme it is context-free articulation of an utterance, submitted to the 
formal demand of completeness with every piece to be articulated as a grammatical sentence, 
and permitting the reproduction of identical utterances in different context." 
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3 The Emerging Communal Constructicon: 
Constructionalization and Constructional 
Changes 


Before discussing the rise and change of constructions within Middle Low Ger- 
man legal writing, I will briefly introduce what constructions are and give a short 
summary of the most important aspects concerning constructions in use. Con- 
structions are (cognitively stored) pairings of a meaning with a (mainly verbal) 
form. Moreover, their idiomaticity and/or high frequency in usage has to be em- 
phasized: 


Any linguistic pattern is recognized as a construction as long as some aspect of its form or 
function is not strictly predictable from its component parts or from other constructions 
recognized to exist. In addition, patterns are stored as constructions even if they are fully 
predictable as long as they occur with sufficient frequency. 

(Goldberg 2006: 5) 


Modelled as “formulaic, fixed sequences" (Bergs and Diewald 2008: 1), construc- 
tions differ in terms of complexity, schematicity and productivity (Goldberg 2006: 
5). They range from single lexical entities (words) or morphemes with grammati- 
cal meaning to complex constructions with several schematic slots (e. g. argu- 
ment structure constructions) being filled in usage. In this regard, not only does 
their form vary from simplex to complex but their meaning or function can also 
be to a greater or lesser extent specific/abstract (Ziem 2018: 9; Croft 2001: 17). 
Ziem and Boas (2017: 275) point out that so-called CEEs (construction evoking el- 
ements) can make up the lexical anchors of varying constructions. In view of par- 
tially specific constructions - the focus of section 4 -, CEEs often fill those slots 
that are lexically fixed. In some cases, complex multiword expressions serve as 
evoking elements (Merten 2018: chapter 4). Overall, CEEs - whether lexical or 
grammatical/schematic — assume a decisive role in the constitution of construc- 
tional gestalts. They can thus project subsequent components or lead to a rein- 
terpretation of preceding elements. 

New form-meaning pairs arise due to (changing) communicative needs and 
are formed by communicative circumstances. Through language use, both form 
and meaning of a construction are subject to variation and change (Hoffmann 
and Trousdale 2011; Hilpert 2011, 2013; Filatkina 2014). In this regard, the major- 
ity of constructionist approaches - e. g. Cognitive Grammar, Radical Construc- 
tion Grammar and Cognitive Construction Grammar - can be described as usage- 
based models (Hoffmann and Trousdale 2011: 4), following the guiding maxim 
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that usage of a language shapes its structure and engenders change (Bybee 2010: 
194; Langacker 2010: 94): 


Unlike most other modern theories of linguistics, cognitive linguistics is a usage-based 
model of language structure (Langacker 1987: 46; 2008: 220). In other words, we posit no 
fundamental distinction between *performance' and ‘competence’, and recognize all lan- 
guage units as arising from usage events. Usage events are observable, and therefore can 
be collected, measured, and analyzed scientifically (Glynn 2010: 5-6). In this sense, cogni- 
tive linguistics has always been a 'data-friendly' theory, with a focus on the relationship 
between observed form and meaning. 

(Janda 2013: 2) 


Consequently, realized constructions — constructs - are the “locus of linguistic 
innovation and subsequent change" (Trousdale 2013: 511). Emerging construc- 
tions may allow for a varying construal. This notion from Cognitive Grammar 
highlights the ability to construe one situation in different (linguistic) ways (Lan- 
gacker 2015: 120). Linguistic meaning always encompasses both conceptual con- 
tent and construal: 


Content and construal are equally important aspects of the processing activity that consti- 
tutes linguistic meaning. They cannot be neatly separated (indeed, the selection of content 
is itself an aspect of construal). The rationale for distinguishing them is that the apprehen- 
sion of a situation is more than just a representation of its elements. While content and con- 
strual are ultimately indissociable, the distinction draws attention to the flexibility of con- 
ception and the variability of expression even in regard to the same objective 
circumstances. 

(Langacker 2015: 121) 


For example, one conceptual content (CELEBRATE) can be construed as process 
(celebrate) or thing (celebration); a shift in profile^ takes place: 


(a) We celebrated all night. 
(b) The celebration was great. 


Depending on the communicative intentions, propositions and their relations 
can be construed differently - e. g. as a complex prepositional phrase (thing pro- 
file) or as a subordinate clause (process profile) - with the result that divergent 
images emerge (Langacker 2008: 55). Construal has to be viewed as a “multifac- 


4 Cf. Langacker (2008: 98): “[W]hat determines an expression’s grammatical category is not its 
overall conceptual content, but the nature of its profile in particular." 
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eted phenomenon” (Langacker 1999: 5), encompassing the dimensions of speci- 
ficity, focusing, prominence and perspective. All of these conceptual factors have 
in fact “manifestations in other sensory modalities” (Langacker 2015: 121), they 
rely on fundamental perceptive phenomena. In this context, Langacker (2015: 
121) underlines the “primacy of vision and the grounding of cognition in percep- 
tual and motor interaction”. 

Back to constructional change: In a diachronic perspective, as has already 
been pointed out, new constructions emerge and already existing ones are used 
in new contexts or change with regard to frequency, form or function. They are 
adapted to changing communicative circumstances (Tomasello 2003: 14). These 
processes can be accompanied by changes in degree of schematicity, productivity 
and compositionality. According to Traugott and Trousdale (2013: 20-21), two 
types of processes can be distinguished in this context: Constructional change 
has to be differentiated from processes of constructionalization. Whilst construc- 
tional change only affects one dimension of a construction (Hilpert 2011: 69, 
2013), constructionalization involves the creation of a new form-meaning pair. 
Newly emerged constructions can make a huge impact on the overall boundary 
structure of the linguistic system. Viewing this structure as a (mental) network of 
related constructions — the so-called constructicon (Goldberg 2003: 220) -, 
emerging entities (form-meaning pairings) form new nodes and can thus change 
the whole network architecture. Both processes have in common that they are 
gradual in nature. Normally, only one constructional feature changes at a time 
and the observable steps are (very) small.” 


A succession of small discrete steps in change is a crucial aspect of what is known as 'grad- 
ualness' (Lichtenberk 1991b). We understand 'gradualness' to refer to a phenomenon of 
change, specifically discrete structural micro-changes and tiny-step transmission across 
the linguistic system [...]. Synchronically it is manifest in small-scale cariation and ‘gradi- 
ence’ [...]. This means that at any moment in time changing constructions contribute to gra- 
dience in the system. 

(Traugott and Trousdale 2013: 74-75) 


Focusing on the underlying corpus, it has to be pointed out that within one single 
text (as a synchronic form of the language/practice under investigation), the dif- 
ferent stages of constructional change can surface as contextually determined 
variants (Heine and Narrog 2010: 409). Constructions of different ages exist side 


5 "While there is no predetermined order for reanalyses at different constructional levels, the 
hypothesis is that pragmatic changes precede semantic changes; and these meaning changes 
precede formal changes" (Trousdale 2012: 543). 
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by side and may be realized by one legal writer in one textual record. Hence, the 
distinction between synchrony and diachrony is not sharp, rather, synchrony 
and diachrony have to be viewed as an integrated whole (Bybee 2010: 105; Lang- 
acker 2010: 94). Here, the frequency of related constructions can reveal which 
variants are more or less entrenched and conventionalized at a given time. Gen- 
erally speaking, a (relatively) high frequency of occurrences mirrors the typicality 
of the respective structure. From a cognitive point of view, high frequency (on 
token and/or type level) indicates a high degree of (constructional) entrench- 
ment. A construction is very likely to be an entrenched and conventionalized en- 
tity at the time of its frequent usage (Bybee 2010: 81; for a further discussion con- 
cerning the connection of frequency, entrenchment and conventionalization see 
Schmid 2010, 2015). However, Traugott and Trousdale (2013: 5) note that the de- 
cision “what level of frequency is sufficient for pattern storage and entrenchment 
is problematic” and has to be modelled as relative and gradual (see also Lan- 
gacker 2010: 94). This is particularly the case “in historical work where the tex- 
tual record is often minimal” (Traugott and Trousdale 2013: 5). 

Especially in the framework of usage-based Construction Grammar, con- 
structions as schematized (often formulaic) patterns of language use can be 
thought of as entities also including information about the communicative usage 
events they are abstracted from (Langacker 2008: 458; Cienki 2015).° Construc- 
tions can be enriched by pragmatic associations (Schmid 2014: 253). Repetitive 
language use contributes to the routinization of pragmatic associations: Due to 
the recurrent usage of form-meaning pairs in relatively stable communicative cir- 
cumstances, a linkage between linguistic structures and “occasions when they 
were uttered” (Schmid 2014: 253) is established and becomes entrenched. This 
coinedness in discourse complies with the notion of “pragmatische Pragungen” 
discussed by Feilke (1996). Certain constructions can be “fitted to particular so- 
cial actions” (Fox 2007: 312) performed linguistically, e. g. headline constructions 
in the context of (written) text production (Merten 2018: 443-451). In this respect, 
linguistic entities — single words, complex constructions, etc. — act as “keys a- 
dapted to different social contexts" (Maas 2001: 94). Inversely, they function as 
contextualization cues (Gumperz 1982: 131), evoking different contexts of speak- 
ing and writing when realized in actual usage events. They serve, for example, as 
key components for doing legal writing in the Late Middle Ages and Early Modern 
Period. 

Legal writers in the Late Middle Ages and Early Modern Period - as (histori- 
cal) community of practice — make use of a shared (and evolving) repertoire of 


6 Cf. also Kristiansen and Dirven (2008); Hollmann (2013). 


234 —— Marie-Luis Merten 


constructions that are more or less enriched by pragmatic associations. These 
form-meaning pairings are linked to the usage event of creating legislative texts. 
As the circumstances and communicative constellations of the production and 
reception of legal texts alter, the repertoire of legal writing constructions under- 
goes a change as well. As pointed out in section 2, it becomes more complex and 
literate. A growing number of literate construal techniques evolves and the com- 
munal’ constructicon - the socio-cognitive network of form-meaning pairs sha- 
red by legal writers — is elaborated. 

By using these constructions, legal writers present themselves as members of 
a (professional) community and enact a certain social identity’. In this view, com- 
munity membership is based on shared expertise/skills (Clark 1996: 102; Croft 
2000: 939) referring to “the same in different individuals” (Schatzki 2002: 18): 


The term community does not imply necessarily co-presence, a well-defined, identifiable 
group or socially visible boundaries. It does imply participation in an activity system about 
which participants share understanding concerning what they are doing and what that 
means in their lives and for their communities. 

(Lave and Wenger 1991: 98) 


4 Legal Writing in the Late Middle Ages and Early 
Modern Period: Examples and Insights 


The main function of legal texts, especially of urban law codifications, lies in reg- 
ulating and controlling urban life. Primarily, they have to construe what happens 
or has to be considered if a specific act is committed (e. g. murder, robbery, adul- 
tery, etc.) or an event has taken place (e. g. death resulting in an inheritance 
case). Accordingly, conditionality (if X then Y) is a highly relevant semantic rela- 
tion in this context. Several conditional construal techniques in their usage and 
change are discussed in Merten (2017, 2018) or Tophinke (2009, 2012). But an 
evolving distinctive legal style seems also to appear in the form of restrictive (sec- 


7 See Croft (2000: 94) for the similar notion of communal lexicon as a “specialized vocabulary 
for a particular domain of shared expertise”. 

8 Social identities are co-constructed within communities of practice: “In this view, as individ- 
uals interact with others in shared social practice, their actions — including common ways of 
speaking - shape and are shaped by their social identities” (Mallinson and Childs 2005: 1). 
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tion 4.1) and so-called theme indicating constructions (section 4.2). In the follow- 
ing sections, the focus will be on partially specific constructions and formulaic? 
expressions such as it si denn (dat) (‘unless’) or were it sake dat (‘was it the case 
that’) that serve as construction evoking elements (section 3). While the different 
slots of the respective constructions can be filled with varying contents/proposi- 
tions depending on the legal scenario to construe/to regulate, these CEEs are rel- 
atively stable entities — ‘relatively stable’, because they are — on closer inspection 
— (also) subject to change. 


4.1 Construing Restrictive Relations: it ne si dat- and it si 
denn (dat)-Constructions 


Restrictive relations concern exceptions to (often) previously coded content. The 
oldest restrictive construal possibility realized in the (older) investigated legal 
texts is the exceptive clause. It is also discussed for Middle High German by Paul 
et al. (2007: 402). Structural properties evoking this construction (type) are the 
mononegation — often expressed by the negation particle ne/en — and the sub- 
junctive (finite verb of the respective clause). These features are highlighted by 
bold print in the following examples: 


(1) he heuet sine hant verloren he ne moge se weder kopen weder dat gerichte 
‘he has lost his hand, unless he can buy/purchase it against the law court’ 


(Braunschweig 1227) 


(2  Dhes ne scal ene de uoghet nicht weldeghen . he ne winne it mit rechte 
"Therefore, the bailiff shall not put him into possession, unless he wins it with law 
(justifiably)’ 
(Stade 1279; I:7) 


In (early) Middle Low German, the mononegation can serve as a marker for re- 
striction/exceptions because the contemporary negation of propositions is com- 
monly realized by polynegation (cf. Breitbarth 2014 for the change of negation in 
Middle Low German). This exceptive clause as a restrictive construal technique is 
much more grammaticalized but less explicit than the emerging it ne si dat-con- 
struction — whereby si can be replaced by were (‘was’). The it ne si dat-construc- 


9 Cf. Wray (2002, 2008). 


236 —— Marie-Luis Merten 


tion gradually evolves during the 13th/14th centuries as the polynegation for ne- 
gating (propositions and so forth) is replaced by mononegation. In consequence, 
the it ne si dat-construction represents an increasingly used linguistic option for 
construing an exception to preceding legal norms. 

The multiword string it ne si/were dat — as lexical component of the relational 
construction - can be classified as a formulaic entity. Needless to say, the writ- 
ing/spelling variation in historical times has to be considered in this context. The 
lexical entity encompasses the expletive it, the negation particle ne/en, a (mainly) 
subjunctive finite form of the copular verb sin (‘to be’) and the primary subjunc- 
tion dat. Presumably, due to repetition in usage and thus frequency effects, 
chunking and fixing of a specific form of the exceptive clause - the recurrent in- 
stantiated predicative structure it ne si dat X — has taken place here: 


Once word sequences such as be going to or in spite of have become frequent enough to be 
accessed from cognitive storage and produced as units, they begin to become autonomous 
from the words or morphemes that compose them. Both chunking and increase in auton- 
omy are gradual processes, and the formation of a chunk (a storage and accessing unit) 
does not necessarily mean that speakers are no longer aware of the component parts and 
their meanings. That is, a sequence of words can become automated as a chunk through 
usage while a transparent relationship with the words in other contexts is maintained. 
(Bybee 2011: 71) 


The formulaic expression it ne si/were dat is used as a relating entity. Owing to its 
fixing and (presumable) reinterpretation, we can assume one overarching and 
non-compositional grammatical meaning that is undoubtedly more than the sum 
of its component parts (expletive it + negation particle + copular verb + subjunc- 
tion dat). This functional word group construes the constraining relation between 
a (previous) content X (schematic slot I) and the content Y following this multi- 
word string (schematic slot II). Subsequently, the syntagma it ne si/were dat + 
content Y (exception) is typically placed at the end of more or less complex para- 
graphs and articles (example 3). Sometimes, modifying adverbs are integrated 
into this multiword string, for example the form also (‘therefore/thus’, example 
4): 


(3  Hebbet lude lengu+ot in samender ha(n)t - sterft der en / de len eruen heuet - de bin- 
nen iren iaren sin - wat men van ireme gu+ode vp nemet - dat scal men in weder 
gheuen wanne se to iren iaren komet - It ne were dat kost vp dat gu+ot ghe draghen 
were - de men redeliken bewisen mochte - der men nicht vmme ghan ne mochte - des 
scolen se ire del ghelden 
‘If people have fief together: If one of them dies who has fief heirs who are under their 
years: What one takes from their property, one shall give them back this property, 
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when they come to their years. Unless (it is the case that) costs for the property in- 
curred, one can prove in accordance with the legal norm, one could not avoid, they 
have to pay their share of it’ 

(Goslar 1350; Guardianship, § 9; Lehmberg 2013: 155) 


(4) Hebben fe ouer nene kyndere to famende. vnde is de man voruluchtich. fo nemet fe ere 
medegyft to voren vt. van deme anderen fchal me ghelden de fchult. It en fi alzo dat 
Je mede ghelouet hebbe. Wante denne mo-et fe mede ghelden 
‘If they have no children together and the man is volatile, then, she removes her dowry 
afore. One shall pay the debt by the other. Unless (it is thus the case that) she has 
promised. Because then she has to pay’ 
(Oldenburg 1400; § 46) 


It ne si/were dat provides not only the lexically fixed content of this restrictive 
form-meaning pair, it also serves as a construction evoking element (discussed in 
the previous section 3). Furthermore, it ne si/were dat can function as the profile 
determinant of a conditional construal technique: Legally recorded exceptions 
are often accompanied by construing what has to happen when the exceptional 
case comes to pass. In this way, it ne si/were dat relates a causing entity (excep- 
tional case) and a thereby caused/initiated entity (consequence of the excep- 
tional case) - both part of the construction’ - as can be seen in both examples 3 
and 4. 

We can also find evidence for it ne si/were dat-constructs combined with the 
exceptive adverb denne/dan (‘except/but’). To be precise, these instances show a 
fusion of the two form-meaning pairs - the it ne si dat- and the denne-construc- 
tion. Accordingly, the restrictive meaning is intensified as two restrictive tech- 
niques fuse: 


(5) Ersloghe auer vser borghere en enne gast dod, dat scholde in sodanem rechte bliuen, 
alse dat wente her to ghestan heft - also dat de rad dar nene veste vmme don scholde - 
it en were denne dat de rad den gast ghe velighet hedde 
‘But, if one of our citizens slays a visitor, that has to remain in such a law as this has 
remained until now, namely, that the council therefore should undertake no fortifica- 
tion. Unless (it is the case that) the council had protected the visitor’ 

(Goslar 1350; Breach of the Peace, § 146; Lehmberg 2013: 325) 


(6) Item Soe wylch(er) vand(en) ii vand(en) Raide dye sy dan dair inne wroegeden(n) dye 
en sall dan dair voir nyet neen seggen(n) then we(re) dan dat sy wroegeden(n) van 
segge worden offt van hoer(e)n seggen(n) 


10 These more or less schematic entities — e. g. causing entity and caused entity - are part of the 
complex construction. They (can) show a specific word order (verb-final, verb-second, etc.) that 
can alter over time. 


238 —— Marie-Luis Merten 


‘Item Which one of the two council commissioners, who they reprimand then, he can- 
not make an appeal to it, unless (it is the case that) they reprimanded for a complaint 
or due to hearsay’ 

(Duisburg 1518; plate 3, section 12; Mihm and Elmentaler 1990: 116) 


Examples like these can be found until the early 16th century. In retrospect, they 
illustrate an intermediate stage in the ascent of the it si denn-construction — the 
most recent restrictive form-meaning pair relating two processual entities. Its 
constructionalization “can be seen to have arisen from a number of small local 
changes” (Traugott and Trousdale 2013: 29), so-called “pre-constructionalization 
constructional changes” (Traugott and Trousdale 2013: 36). According to the ob- 
servations presented, a multiple inheritance leads to the creation of this (new) 
form-meaning pair (Trousdale 2013: 511): For at least two functionally related 
constructions (it ne si/were dat- and denne-construction) are involved in this con- 
structionalization and they transmit formal and functional characteristics. In ad- 
dition, the final stage is characterized by (i) the loss/elimination of the negation 
particle and (ii) the primary subjunction dat becoming only an optional element. 
It can be realized (examples 7 and 8), but it can also be left out (example 9): 


(7) Wat averst syn liggende Gru+ende / und stahnde Erve / de mach ein Vormunder nicht 
verkopen / yt sy denn / dat der Kinder u+eterste Noht datsu+elvige erforderde 
‘But what his lying properties and standing bequest, a guardian is not allowed to sell 
those, unless the children’s extreme misery requires it’ 
(Dithmarschen 1567; Article 71, § 2) 


(8) 8.4. It schall averst neen Vormunder der Unmu-endigen Huse / Ho+efe / Ackere / und 
andere liggende Gru+ende verkopen; 
§.5. It ys denn / dat der Unmu+endigen Vader den Kindern so vele Schu+elde nah- 
gelahten / dat de beweglyken Gu+eder tho betalinge dersu+elven nicht konden tho- 
langen 
‘8.4. No guardian shall sell the house, courtyard, field and other lying property of the 
underaged; 
§.5. Unless the underaged’s father left the children so much debt that the moveable 
property was not enough for its payment’ 

(Dithmarschen 1567; Article 22, § 4 and 5) 


(9) NEEn Koep ys voter bestat+endich tho holden / yt sy denn darby ein gewis Koepgeld 
bestemmet 
‘No purchase is valid, unless a certain purchase money is defined thereby’ 
(Dithmarschen 1567; Article 62, Introduction) 


The it si denn (dat)-construction can very likely be categorized as a form-meaning 
pair typical of (historical) legal writing (Paul et al. 2007: 403). In this regard, it 
contributes to a legal style evolving in the period of investigation. As pragmatic 
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association, this form-meaning pair is part of the (contemporary) communal con- 
structicon (16th century). The development described from the exceptive clause 
to the it si denn (dat)-construction mirrors the changing repertoire of the legal 
writers. At different periods in time, different construal techniques serve as pre- 
ferred strategies to construe exceptions - a highly relevant function in the pro- 
duction of legislative texts. 


4.2 Indicating an Overall Theme: the were it sake dat- 
Construction and Related Form-Meaning Pairs 


In the more recent urban law codifications investigated, a certain construction 
type (cluster of related constructions) indicating and introducing an overall 
(text/article) theme is used frequently. Here, the literate entity ‘two-dimensional 
written text’ plays a crucial role. The evolved constructions are tailored to this 
medial form designed for silent reading (as a visual-cognitive practice). From a 
diachronic viewpoint, a primarily conditional construal technique serves as 
source for (one micro-construction of) this construction type/construction clus- 
ter. But, with regard to the conditional usage, its text position is less restricted 
(source construction). This conditional form-meaning pair - with the CEE were 
it/dat sake dat meaning ‘if - can occur at different places in the article/para- 
graph. 


(10) Vortmer wer dat sake dat eyn dem anderen scult gheue vor rychte eder vor den borg- 
heren, [...] bu+ode dey eyn eyt, so eyn solde de andere vort den ghynen vor eyme an- 
deren reychte vmme dey sake nicht mer sculdeghen 
‘Further, is it the case that one accuses the other in court or in front of the citizens 
[...], does he take an oath, then, the other shall henceforth not accuse this one for that 
issue in another court anymore’ 

(Werl 1324; 8 24) 


In constrast, theme introducing/indicating constructions have a relatively fixed 
position, which is restricted solely to the beginning of a new article or paragraph: 
At the beginning of the new article/paragraph the lexically fixed entity occurs as 
CEE (e. g. were it sake dat) followed by the theme/topic-slot that can be filled by 
diverging propositions (processual profile). This constructional characteristic 
can be viewed as textual coinedness (Feilke 1996: 281—282) implying that this for- 
mal feature of theme introducing/indicating constructions - their fixed position 
— is coined with respect to the written text and its characteristics. In addition, an 
expansion of function can be seen in these cases. Especially in the land law of 
Dithmarschen - the most recent legislative text investigated -, the were it sake 
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dat-construction (see examples 11 to 13) not only construes a conditional relation- 
ship but marks the beginning of diverging articles and introduces their topic(s). 
Lehmann (1988: 187) has already pointed out that initial position usually indi- 
cates the topic of sentences: 


Just as elsewhere, sentence-initial position usually identifies the topic (more precisely, the 
exposition, in the terms of Lehmann, 1984: ch. V.5) of the sentence. This is well-known from 
left-dislocated NPs. It is perhaps not so well known that a whole subordinate clause may 
also provide a topic for the following main clause. 

(Lehmann 1988: 187) 


In our context, this observation refers not only to sentences but also to more com- 
plex textual entities. Moreover, the recurrent multiword string were it sake dat (= 
lexically fixed elements) serves as a salient entity in view of its visual perceptibil- 
ity; it can be easily perceived (and found in texts when searching for new para- 
graph beginnings). All in all, this construction type - besides serial numbering of 
the articles and so forth - is an aid to quick orientation in different parts of com- 
prehensive text. This functional extension is brought about by writing and the 
two-dimensionality of written records. In the following examples from the land 
law of Dithmarschen, modifying adverb constructions (e. g. ok- (‘also’) and aver- 
constructions (‘but’)) can be combined with this complex form-meaning pair: 


(11) SO ener Schaden lede dorch syne Kleder / he worde gesteken / effte gehowen / so 
schall men ehm den Schaden behteren und nicht de Kleder. 
8.1. Were yt Sake / dat Se ehm ock anders syne Kleder thorehten hedden / dat be- 
wyslyk were / so scho+elen Se ehm desu+elven betahlen / wat se wehrt syn. 
8.2. Und de Kleder scho-elen tho der Behoef / ... 
'So/when one suffers harm through his clothes, he was stung or hit, then, one shall 
pay for his damage and not for the clothes. 
8.1. Is it the case that they also have to refund his clothes in another way, what is 
proven, then, they shall pay him these, what they are worth. 
8.2. And, for this purpose, the clothes shall ...’ 

(Dithmarschen 1567; Article 101) 


(12 8.1. Were yt ock Sake / dat dar wol synem Volcke Schuld geve / ux emme jennig Guht 
/ dat ehm entfehret were / dat schall he dohn / ... 
‘8.1. Is it also the case that there [one] likely blames his folks, for that property that 
was stolen from him, he shall do this ...’ 
(Dithmarschen 1567; Article 83, § 1) 


(13) 8.1. Were yt averst Sake / dat he yt nicht bewysen konde / scho+elen beyde Ko+eper 
und Verko+eper schweren / dat eer Koep recht und redelyk / su+ender allen falsch 
und bedreechlicheit gegahn sy / so hoch alse se seggen ... 
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‘8.1. But is it the case that he cannot prove it, both, buyer und seller, shall swear that 
their purchase has happened fairly and honestly without any falsehood and decep- 
tion/imposture, as high as they say ...' 

(Dithmarschen 1567; Article 67, 8 1) 


Although at first glance not closely related to each other, other multiword expres- 
sions evoking theme indicating constructions in the land law of Dithmarschen 
(1567) are begeve yt sick dat (‘does it come to pass that’), droge yt sick to dat (‘does 
it happen that’) and befunde it sick dat (‘does it take place that’). However, on 
closer inspection, they have a number of things in common: Typically, they share 
a finite verb in initial position (with the meaning potential ‘happen/come to 
pass"), the subsequent expletive yt/it, the reflexive pronoun sick and the primary 
subjunction dat. On this schematic level, these CEEs are in fact identical and in 
turn related to the more common were it sake dat as a fixed entity. The difference 
lies in the use of a construction with full verb(s) (zutragen (‘happen’), befinden 
(‘happen’), etc.) and processual profile vs. the use of one with a copular verb and 
the (legal) noun sake (‘legal case’) that allows a thing profile (section 3). 
Interestingly, conditional construal techniques such as the efft-construction 
(if’-meaning, a conditional relationship is profiled) merge with theme indicat- 
ing/introducing constructions on the construct level. These fusion examples can 
be interpreted as supporting evidence for the specific functionality of the latter 
constructions. As has already been pointed out, it highlights the beginning of a 
new paragraph and is not only responsible for construing a conditional relation- 
ship. This function is realized by the efft-construction (examples 15 and 16): 


(14) Effte ein Tutege Kranckheit halven tho Rechte nicht kamen konde. 
Begeve yt sick / dat einer de tu+egen scholde / so schwack und kranck werde / dat 
he uht syner Behu+osinge vor Recht nicht kamen konde / so schalde Vaget... 
‘If a witness could not come to the court due to illness. 
Does it come to pass that one who shall testify becomes so weak and ill that he can- 
not leave his house in order to come to court, then, the bailiff shall ...’ 
(Dithmarschen 1567; Introduction into Article 11) 


(15) 8.10. Efft yt sick ock begeve / dat enner / de Schaden gewunnen hadde / tweerley 
Worde fo+ehrede / alse / dat he den Schaden des Avendes geve up eenen / und des 
Morgens up enen andern / ys dat bewyslyk / so schall he ... 

‘8.10. If it also comes to pass that one who suffered damage conducts twofold words, 
such as that he attributes the damage to one person in the evening and to another in 
the morning, if this can be proven, then, he shall ...’ 

(Dithmarschen 1567; Article 94, § 10) 
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(16) 8.3. Effte sick denn befunde dat se ehres Gelovens nicht rein / und in ungo+ettlyker 
Schwermery steken / sick ock eines beteren nicht underwysen / und vanehren Erdohm 
wolden affleiden lathen / de scho+elen ahne Middel des Landes verwyset werden. 
‘8.3. If it takes place that they [are] not pure of their belief and are found in ungodly 
rapture, [they] also do not want to be open to conviction and disabused of their error, 
they shall be expelled from the country without funds.’ 

(Dithmarschen 1567; Article 2, 8 3) 


Allin all, itis striking that these complex lexical entities are maintained and con- 
served although less complex alternatives - at least with a conditional meaning 
— exist. As Hoffmann and Trousdale (2011: 5) note, “if the same content can be 
expressed by two competing structures and one of these is easier to process than 
the other, then the simpler structure will be preferred in performance". This con- 
sideration supports the very likely assumption of functionally charged form- 
meaning pairs in the case of the were it sake dat-, begeve it sick aver-constructions 
etc. discussed above (theme indicating, visualizing beginning of paragraph/arti- 
cle). In addition, the dimension of social value/meaning (Elspaß 2015) within a 
language community seems to play a decisive role. In contrast to existing less 
complex alternatives (efft- or wanne-constructions), these multiword expressions 
as CEEs (were it sake dat, befinde it sick dat, etc.) seem to offer a socio-pragmatic 
added value (in their simply larger gestalt, their distance marking use of subjunc- 
tive verb forms and so forth)." In this context, legal writers are alluding to com- 
plex forms associated with skillful writing and prestigious language use in the 
Early Modern Period (Schwitalla 2002). In so doing, they underpin their profes- 
sionality with the use of highly literate constructions that are part of their com- 
munal constructicon and which have evolved through the medium of writing. 


5 Conclusion 


The paper shed light on language elaboration processes in Middle Low German 
legal writing, whereby an underinvestigated historical language challenging 
(common) grammar theories became the central subject of investigation. Due to 


11 However, the conservative nature of law needs to be considered in this context: *The law has 
to be revised constantly so as to keep it up to date with social change. This need for revision, 
however, does not mean that the language of the law will automatically be updated at the same 
time. On the contrary: since the law is essentially a conservative institution, it follows that its 
language is relatively conservative as well. It is therefore not likely to change very quickly." (Hil- 
tunen 2012: 50) 
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the diachronic research interest, an approach was adopted which was capable of 
coping with phenomena of language in transition and both formulaic (lexical) 
expressions and more complex form-meaning pairs between fixedness and vari- 
ability: (Diachronic) Construction Grammar allows for the detailed description 
and explanation of changing form-meaning pairs, of observable relations be- 
tween different constructions (as well as the development of these relations) and 
elements evoking those linguistic construal techniques (see also Filatkina 2018). 
Especially, the idiomatic/non-compositional nature of (grammatical) meaning 
was emphasized. Important processes in language change/usage such as reinter- 
pretation and chunking as well as the idea of tiny-step transmissions played a 
crucial role. Source and target constructions were taken into account — whenever 
possible with regard to the underlying corpus - and a case of multiple inheritance 
where more than one form-meaning pair was involved in the creation of a new 
construction was discussed. 

In particular, the focus was on the functionality of certain evolving construc- 
tions with regard to written text(s). Emerging constructions induced by writing 
can be viewed as literate entities and seem to be more complex than orate form- 
meaning pairs. They have to be interpretable independent of context and, thus, 
include all information necessary for comprehension. The nexus of language 
elaboration processes and legal writing was pointed out. Subsequently, attention 
was drawn to the adaption of vernacular language to literacy on the basis of ur- 
ban law codifications produced over a period of more than 300 years (1227 to 
1567). The evolving literate form-meaning pairs are increasingly tailored to (the 
production of) written texts structured for silent reading. In this legal context, 
important construal techniques concern conditional, causal or restrictive rela- 
tions and so on. The combining of propositions and the changing ways of relating 
them linguistically are an interesting object of investigation. Certain construc- 
tions seem to be bound to legal writing, they can be modelled as pragmatic asso- 
ciations and are part of the changing communal constructicon of the historical 
community of legal writers. 

InterGramm” - a Digital Humanities project at Paderborn University — con- 
tinues this investigation of language elaboration processes in Middle Low Ger- 
man. Although the focus is on changing constructions, the underlying corpus 
consists of considerably more texts. Additionally, besides linguists, both compu- 
tational linguists and computer scientists are part of the project team. By apply- 
ing a human-in-the-loop approach, we combine phases of (human) expert anno- 
tation and machine learning. For the automatic construction tagging, we espe- 


12 For further information: https://www.uni-paderborn.de/forschungsprojekte/Intergramm/ 


244 —— Marie-Luis Merten 


cially use (lexical) construction evoking elements. These important elements are 
(relatively) easily identifiable and useful for hypothesising what kind of construc- 
tion might be instantiated. Especially in the historical context, annotating lin- 
guists have to be aware of a comparative fallacy that emerges when researchers 
fall into the error of investigating one language by comparing it to another, for 
example, their native language. The historicity of the language under investiga- 
tion has to be given serious consideration. Historical languages must be viewed 
on the basis of their own common structures/constructions, characteristics and 
functionalities. 
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Christian Pfeiffer & Markus Schiegg 
Religious Formulae in Historical Lower- 
Class Patient Letters 


Abstract: This article examines the use and functions of religious formulae in his- 
torical lower-class letter writing. The data analysed are taken from the Corpus of 
Patient Documents (CoPaDocs), a new corpus of 19th- and early 20th-century 
texts written by patients from German psychiatric hospitals. An illustrative inves- 
tigation of the occurrence and usage of religious formulae allows us to differenti- 
ate between explicit and implicit uses and to discuss instances of variation and 
modification. The functional analysis exhibits predominantly argumentative 
functions of religious formulae, but also parallelisation, expression of a shared 
ethos and a text-structural function. Finally, we discuss to what degree the for- 
mulae found are corpus specific, for example, resulting from the religious delu- 
sions of some patients, or whether they could also be connected to the idea of a 
cross-linguistic repertoire of formulaic writing. 


1 Introduction 


Den gemeinen Mann in Deutschland hórt man bei solchen Gelegenheiten, wo er sich beson- 
ders feierlich erheben móchte, haufig in den Bibelton verfallen. Bei dem Volke ist die Bibel- 
sprache zur Sprache des täglichen Lebens, zur Sprache des innigeren Familienverkehrs, ja 
selbst zur Sprache der Liebe geworden, denn selbst in die Liebesbriefe des Volkes, gerade 
je ernsthafter es der ungeübte Schreiber meint, tritt um so leichter diese biblische Farbung 
ein [...]. 

(Mundt 1844: 159-160)’ 


In our paper, we discuss the results of a case study on the use and functions of 
religious formulae in 19th and early 20th century letters written by patients from 
psychiatric hospitals. In the past, the focus of research on formulaic language has 


1 Translation: ‘On occasions where he wants to elevate himself in a particularly festive manner, 
one can hear the common German slipping into the biblical tone. For the common folk, biblical 
language has become the language of everyday life, the intimate language inside families, and 
even the language of love, hence even in the peoples' love letters, this biblical tone joins in, 
which happens, the more wholeheartedly the unroutined writer means it.' 


@ Open Access. © 2020 C. Pfeiffer, M. Schiegg, published by De Gruyter. This work is licensed 
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mainly been on standard languages and close-to-standard varieties. As a conse- 
quence, we see a lack of studies based on text types from non-standard domains, 
even more so with regard to historical varieties. With the rise of new approaches 
to the history of language, such as the concept of a language history ‘from below’ 
(Elspaß 2005), private letters written by less-experienced writers are now re- 
garded as a particularly valuable source of linguistic data. This is due to the writ- 
ers’ limited literacy: while they obviously do have a command of basic writing 
skills, they are usually not familiar with producing conceptually written lan- 
suage, or the language of distance in the sense of Koch and Oesterreicher (1985). 
Such letters hence provide a good opportunity to reconstruct the non-standard 
varieties of common people and to get “as close as we can [...] to authentic oral 
registers" (Elspaß 2012: 45). This also holds for investigations into formulaic lan- 
guage. 

However, studies on the use of formulaic language in historical letters writ- 
ten by inexperienced writers are quite rare. This may be explained by the - al- 
ready mentioned - dominant paradigms in the research on formulaic language, 
but also by a lack of available texts. Despite an increase in schooling and thus 
writing competence during the 19th century (Elspaß 2005: 76), lower-class people 
did not usually have a reason to write, either in their private lives or in their pro- 
fessions as farmers, craftsmen, etc. And if they ever wrote to each other, their 
texts would not usually be transmitted to us, but kept by their families and 
thrown away after some time. 

Situations of separation, however, increased the amount of text production 
by lower-class people. Emigration played an important role in the 19th century 
and we have evidence of a large number of letters written by emigrants, especially 
emigrants to North America, sent home to their families (Elspaf3 2005). Soldiers 
were also far away from their families and wrote home (Langer 2013). Another 
context of separation in which letter-writing became relevant were the psychiat- 
ric hospitals that were established systematically throughout Europe during the 
19th century. After their hospitalisation, writing letters permitted the patients to 
continue personal communication with their spouses, relatives and acquaint- 
ances. Hence, patients at these institutions often wrote letters home, but also to 
the doctors of the hospitals and other recipients. These letters, however, were of- 
ten not sent out but censored and put into the patients’ files as proof of their men- 
tal illnesses (cf. Schiegg 2015). The research project ‘Flexible Writers in Language 
History’ in Erlangen, Germany, is currently compiling the first corpus of about 
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2,000 of those letters and other texts written by 19th- and early 20th-century pa- 
tients from German psychiatric hospitals.? This corpus is the empirical basis of 
our study. 

Taking into account the results of other studies, we expect the patients to of- 
ten resort to formulaic language that, among other things, could support their 
text production and strengthen their arguments (Elspaß 2005: 174). Furthermore, 
as religion was a central part of 19th-century everyday life, particularly for rural 
people from the lower classes, we also assume that their texts will occasionally 
adopt religious elements (cf. the introductory contemporary quotation). The cen- 
tral purpose of our paper is thus to analyse the usage and functions of religious 
formulae in the CoPaDocs letters. Since research in the field of ‘language in reli- 
gion' generally shows a lack of studies with a pragmatic orientation (Lasch and 
Liebert 2014: 478), our study might also be relevant from this point of view. 

The paper is structured in the following way. In section 2, we delimit our con- 
cept of a religious formula and summarise the results of existing research on reli- 
gious formulaic writing by lower-class people. In section 3, we describe our cor- 
pus and the methods used to identify religious formulae in the letters. Sections 4 
and 5 discuss the results of our case study. Section 4 gives an overview of the 
occurrence and usage of religious formulae in the corpus with regard to explicit 
versus implicit uses, to variation and creative modifications. In section 5, we an- 
alyse different functions of religious formulae in the CoPaDocs letters, with a fo- 
cus on argumentative contexts. The paper ends with a summary of our main find- 
ings against the background of existing research on 19th century private letters. 


2 Religious Formulaic Language in Lower-Class 
Letter Writing 


2.1 Religious Formulaic Language 


Before analysing the uses and functions of religious formulae, it is essential to 
define our concept of the term religious formula. Particularly when dealing with 
historical data, it is helpful to use a broad definition of formulaicity (Filatkina 
2018: 164). We therefore take the often-cited working definition by Wray (2002: 9) 
as a starting point and consider formulaic any "sequence, continuous or discon- 
tinuous of words or other elements, which is, or appears to be, prefabricated: that 


2 Seethe Corpus of Patient Documents (http://copadocs.de, accessed July 5, 2019). 
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is, stored and retrieved whole from memory at the time of use, rather than being 
subject to generation or analysis by the language grammar". Expanding this ap- 
proach, we additionally integrate the view advocated by Stein (1995: 57—58) that 
pragmatically fixed one-word-utterances can also be regarded as formulaic. The 
scope of formulaic language thus ranges from one-word-utterances with a spe- 
cific pragmatic function over multi-word units and fixed sentences to formulaic 
texts (Filatkina 2018: 30). 

A definition as such, however, does not permit a reliable extraction of formu- 
laic material from a text corpus (cf. Wray 2009). To identify formulaic language 
in texts, we also need an operationalisation of the defining features, i.e. inde- 
pendent criteria that mark a given sequence as prefabricated and hence formu- 
laic. In this context, Wray and Namba (2003: 28) offer eleven diagnostic criteria 
that can be derived from peculiarities of formulaic units in terms of the four di- 
mensions form, meaning, function, and provenance. Most of these criteria, how- 
ever, cannot be operationalised objectively either. Rather, it is their function to 
assist and support the researcher's introspective judgement, to help articulate the 
basis of his or her intuition and to provide insight into possible biases (Wray 
2009: 41). The final decision on the status of an item, however, remains subjective 
and based on the researcher's intuition. 

The second question to be addressed is under which circumstances a given 
formulaic item can be regarded as religious. A variety of approaches (semantic, 
discursive, intertextual etc.) could be chosen here.’ For the purpose of our study, 
we will focus on the provenance aspect, regarding formulaic items as religious if 
they are associated with a particular religious source or context. Possible sources 
here include texts from the scriptures, but also popular textual sources from other 
religious contexts such as prayers, hymns, liturgies, rites, and texts from the cat- 
echism. From a theoretical point of view, the narrowing of the aspect of prove- 
nance is certainly simplifying; this reduction, however, is a necessity for a case 
study like ours. 

To be classified as a religious formula in our article, it is not essential that the 
wording used in the patient letter should be completely identical with an availa- 
ble source text. We can assume that writers did not usually have the particular 
text at hand when writing. They normally quoted from memory rather than from 
the original text. Consequently, we also include cases in which the formulation 


3 According to Lasch and Liebert (2014: 477), characteristics of the religious domain are dis- 
tinct objects, a reference to transcendence, and a more or less elaborated metaphysics. 
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itself is novel but clearly derived from or associated with something that is for- 
mulaic in its own right.* Thus, we have a continuum of usages, ranking from 
word-for-word formulae, fully identical with an available version of the original 
text, through formulae with individual deviations, to allusions and paraphrases, 
i.e. intertextual references which do not try to recapitulate the original text but 
merely refer to the content of a certain religious formula (cf. Lange 2012: 100-101; 
Bartolini 2012). In all these cases, the intertextual character of the formula can 
either be made explicit or remain implicit (see section 4). 


2.2 Research Overview: Religious Formulaic Language in 
Lower-Class Letter Writing 


Formulae are and have been essential parts of letters. In the nineteenth century, 
letter writing manuals enjoyed great popularity, and forms of address as well as 
closure formulae were considered the ‘touchstone’ (Baasner 1999: 23) of any offi- 
cial letter. These manuals, however, do not seem to have had significant influ- 
ence on lower-class writers (Elspaß 2005: 195).° Nevertheless, comparative re- 
search on textual sources from different European countries has revealed that 
lower-class writers in particular relied heavily on textual routines and formulaic 
patterns acquired from model letters and other texts (Elspaß 2012: 45). Rutten and 
Van der Wal (2013) have identified a strong correlation between the frequency of 
formulae and the social class of a writer. They explain the increased use of for- 
mulae by ‘ordinary people’ by the fact that the use of formulae was “convenient 
to lesser-skilled writers" (2013: 45), for whom writing was not an everyday prac- 
tice. It helped these writers solve communicative problems in the written code 
(Elspaf$ 2005: 192). 

Several of the formulae that research has identified in lower-class letter writ- 
ing contain religious elements. While Barton (1975: 5) states that most of his ana- 
lysed emigrant letters written by Swedes in America "have in fact little of interest 
to relate" and consist, among other things, of *religious platitudes", more recent 
research has identified the textual functions of religious formulae beyond the 
mere informational function. In her analysis of 19th-century Scottish correspond- 
ence, Dossena (2013: 57) considers the religious element in letters to be a *highly 
meaningful device for the expression of involvement and psychological proxim- 
ity between participants". Thus, religious formulae can be “network-reinforcing 


4 Cf. the formulaicity criterion of ‘deviation’ discussed in Wray and Namba (2003: 28). 
5 Seea historical overview on the text type ‘letter manual’ in Schiegg (forthc.). 
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strategies" (2013: 50), by which a *writer expresses shared ethos with the recipi- 
ent" (2013: 54). Similarly, Rutten and Van der Wal (2012: 181) have identified the 
‘Christian-ritual function’ as one of the main functions of epistolary formulae in 
17th- and 18th-century Dutch correspondence. Such formulae “usually place the 
writer and/or the addressee under divine protection, thereby manifesting the 
writer's religiosity". The authors connect this with Coupland's (2007) sociolin- 
guistic theory of stylisation and interpret religious formulae as stylisations of 
“ethical reliability" (Rutten and Van der Wal 2012: 181). Elspaß’s analysis of 19th- 
century German emigrants to America has revealed a large number of biblical 
quotations and proverbs, religious formulae and commonplaces that justify, ex- 
cuse or support an argument. Writers trust in the approval of general religious 
‘truths’ valid in their communities to avoid opposition. Thus, religious formulae 
can provide various aids in the formulation of letters (Elspaß 2005: 175-181). 


3 Religious Formulaicity in the CoPaDocs Letters: 
Corpus and Methods 


The Corpus of Patient Documents currently (March 2019) consists of about 2,000 
texts written by more than 200 different writers born in the 19th century. Most of 
the writers can be classified as ‘ordinary people’ without much writing experi- 
ence. They usually followed manual professions and had only attended primary 
school. Social data about these writers such as their professions, their prove- 
nance and family background as well as their financial circumstances can be re- 
trieved from the patient files. The medical diagnoses given in the files, however, 
need to be treated with caution and can definitely not be equated with modern 
medical diagnoses. The great majority of their texts are letters, but in the patients' 
files, we also encounter other text types, such as autobiographic texts, poems, 
and diaries. Some patients wrote religious texts that can be identified by their 
title, for example: ‘prayer before lunch’ and ‘evening prayer’ (ans-34 Johann G. 
A.°) or ‘prayer against blasphemous thoughts’ (kfb-80 Hans A.). There is a 'ser- 
mon’, written by the baker Franz O. (kfb-518), ‘confessions’ by the tailor Johann 
V. (kfb-775) and the miller Josef W. (kfb-2058) and even a ‘prophecy’ leading to a 
supposed apocalypse in 1888 by the nailer Johann H. (kfb-789). Johann H. was 
suffering from religious delusions, and a large number of references to religion 


6 Patient IDs are structured in the following way: institution (e.g. ans - Ansbach; see the refer- 
ence section), file number (e.g. 34), first name, middle and last name. 
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are found, particularly in texts by patients with such a diagnosis. Apart from one 
example from a diary (see 14), however, in this article we only consider religious 
formulae that appear in letters. 

A characteristic feature of the corpus and of historical letters in general is a 
large number of spelling variations, not only between different writers but also 
within one and the same person. Inexperienced writers in particular show this 
kind of variation, which is influenced by their degree of education, their regional 
varieties and by individual stylistic factors. The absence of a consistent orthogra- 
phy in our texts — along with a supposedly low frequency of individual formulae 
— obviates the application of corpus driven methods to automatically extract for- 
mulaic language, especially since corpus compilation is still in progress and lem- 
matisation has not yet been conducted. In addition, the CoPaDocs corpus cur- 
rently comprises about 750,000 words - a size too large to be investigated on a 
manual basis alone. Consequently, the methodological approach used here does 
not aim at an exhaustive extraction of all religious formulae. However, this is not 
the ambition of our study in any case. Instead, we aim at an illustrative analysis 
of the uses and functions of religious formulae in the CoPaDocs letters. For this 
purpose, we applied a mixture of different methods to extract a sufficiently large 
sample of relevant text passages. 

Our first approach was lexically based. We searched the corpus for single lex- 
ical items typical in religious contexts, assuming that these items should also fre- 
quently appear in formulae of religious provenance. Such key word searches 
were executed for the expressions Amen and Kreuz (‘cross’). In both cases, we 
integrated a large number of spelling variants. The resulting hits were manually 
reviewed in order to identify those examples representing religious formulae in 
the sense of this paper. Afterwards, we also conducted key word searches for the 
items Bibel (‘bible’), sagen, and sprechen (‘say’, including their morphological 
variants). The rationale was that some writers often use source-indicating routine 
expressions based on verba dicendi. By searching for the central constituents of 
such formulae, we were indeed able to identify a large number of explicit refer- 
ences to religious intertexts. 

Besides these semi-automatic approaches, we examined the letters of one 
writer with regard to religious formulae using a traditional paper and pencil strat- 
egy: Martin B. (kfb-1621). Being a day-labourer and suffering from delusions, he 
is a prototypical example of an inexperienced writer who uses a lot of religious 
language in his letters. What is particularly interesting here is that Martin B. 
writes not only the relatively large number of 14 letters, but also letters to differ- 
ent addressees (his wife, his aunt, his brother-in-law, the mayor of his home 
town) in different social groups. Hence, in his letters he uses different registers 
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(cf. Schiegg 2018), while the general function of his letters is relatively constant: 
appeals to be released from the hospital. We thus have plenty of intraindividual 
variation here that also appears in his formulaic language. 

A difficulty in comparing the formulaic items with their possible sources re- 
sults from the large number of Bible versions and other religious texts that were 
common in the 19th century — and of course the writers’ own variations and mod- 
ifications. It was not possible to find one ‘best’ Bible version for all cases, so we 
decided to quote different versions to achieve a maximum degree of accordance 
with the wording in the letter. With regard to the English translations of the for- 
mulae, we quote from the online-version of the Standard King James Bible (KJV). 


4 Occurrence and Usage of Religious Formulae in 
Patient Letters 


4.1 Explicit vs. Implicit Usage 


In this section, we will illustrate different ways of how religious formulae appear 
and are integrated in the patients' letters. We can make a basic distinction be- 
tween formulations which are explicitly marked as a religious formula on the one 
hand and implicit references to religious sources on the other. 

In the letters, we find a variety of strategies used to explicitly mark the status 
of a text passage as a (religious) formula. In a number of cases, formulae are set 
apart from the rest of the text. Sometimes, this is realised by the arrangement of 
the page, e.g. by paragraphing and visible spaces between the body of the text 
and the formula and/or by explicit indications of the source text. This is illus- 
trated by examples (1) to (4). 

The farmer's daughter Magdalena S. (kfb-450) was suffering from religious 
delusions. Both of her letters, one to her sister (22.11.1857) and one to her friends 
(Christmas Day 1857; see example 1 and figure 1), begin with a biblical quote di- 
rectly after the date and address, which span over 4 and 3 lines respectively and 
close with an identification of their sources. In each letter, the formula appears 
in an individual paragraph. The letter to her friends even has an empty line after 
the formula. Magdalena S. presumably copied both formulae from a Bible or an- 
other religious text that she had access to. A hint for a printed model text rather 
than her memory is the full capitalisation of ‘HERRN’ (‘the Lord’; line 3), a com- 
mon practice for nomina sacra in printing (Nübling et al. 2017: 265): 
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Fig. 1: Beginning of Magdalena S.'s (kfb-450) letter to friends (Christmas Day 1857) (example 1) 


(1) Farmer’s daughter Magdalena S. (kfb-450), letter to her friends (Christmas Day 1857): 
Irsee. Am h. Weihnachtsfest 1857. 
Meine liebsten Freunde! 
Die auf den HERRN harren, kriegen neue Kraft, daf} sie 
auffahren mit Flügeln wie Adler, daß laufen und nicht matt werden, 
daß wandeln und nicht müde werden. Jesa. 40, 31. 


The complexity and length of this quotation, the adherence to contemporary writ- 
ing conventions (the capitalisation of HERRN ‘the Lord’) and the correct quota- 
tion of the source makes us assume that Magdalena S. did not cite this sentence 
by heart, but copied it from a written source. A search through the texts between 
1800 and 1857 available as digital versions in the catalogue of the Bavarian State 
Library produces hundreds of results for this formula. None of them, however, 
has the exact wording used in the letter, so we do not actually know Magdalena 
S.’s textual version. Nevertheless, a fairly good correspondence can be found, for 


7 In all cited examples, passages identified as religious formulae are printed in bold. Text in 
Latin script appears in italics, text in blackletter in roman font. 
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example, in a 48-page religious consolation book for the sick (Kindler 1848).° 
Both formulae are only slightly connected to the content of the letters, in which 
Magdalena S. complains about her situation in the psychiatric hospital, while the 
formulae convey a more positive mood: the temporality of all misery. 

Other strategies to highlight formulaic language are punctuation (especially 
quotation marks), indications of the source, metalinguistic introductions and any 
combinations of these features. Such forms of highlighting, however, are rather 
rare in the CoPaDocs corpus, which consists mainly of texts by inexperienced 
writers. The rare examples we do find in the corpus generally come from more 
educated writers and mostly appear in more recent texts (beginning of the 20th 
century) at a time when orthographic standardisation had begun. Example (2) 
illustrates these strategies. In a letter to her mother, the deaconess Katharina S. 
uses a direct quote from the Psalms: 


(2) Deaconess Katharina S. (kfb-3085), letter to her mother (01.05.1925): 
Ds. Wort des Psalmisten: „Freuet euch mit Zittern“, möchte ich Dir zurufen. 


The formula is syntactically integrated into the context, but optically separated 
by two quotation marks, a colon at the start, and a comma at the end. In addition, 
the writer names the source, though not the exact place in the book (Psalms 2:11). 
She probably quotes the short sentence by heart, remembering it because of her 
religious profession. 

Different languages and/or scripts, e.g. blackletter vs. Latin script (cf. 
Schiegg and Sowada 2019), can also be used to highlight religious formulae. This 
practice mainly occurs in contexts of code switching when religious formulae ap- 
pear in foreign languages, especially, but not only (see example 4), in Latin. Low- 
er class writers did not usually have any knowledge of foreign languages, so they 
had to transcribe phonetically from memory. The farmer Georg W. provides such 
an example (3; see figure 2) in a letter to his parents, where he closes with a Latin 
sentence that he probably remembered from the Catholic church services held in 
Latin before the Second Vatican Council in the 1960s. After his signature, Johann 
Georg W., he finishes his letter by reciting the priest's absolution together with 
the response Amen, a common closure formula that we often found in patient let- 
ters (see section 5.2). Given that the writer does not have any command of the 


8 “Die auf den HERRN harren, kriegen neue Kraft, daß sie auffahren mit Flügeln wie Adler, daß 
sie laufen und nicht matt werden, daß sie wandeln und nicht müde werden. Jes. 40, 31.” (Kindler 
1848: 7). 


Religious Formulae in Historical Lower-Class Patient Letters — 259 


Latin language, the quality of his phonetic transcription is all the more remarka- 
ble — and a strong indication of a high language competence, which is generally 
characteristic of his letters (cf. Schiegg 2015: 155-177). 


(3 Farmer Georg W. (kfb-1720), letter to his parents (05.05.1890): 
Missereatur vestris n omnipets deus die misis 


bekatus vestris vertukatvos im, 'vita met ternam 
Amem 


EIER: 
| RE 


Fig. 2: Closure of Georg W.’s (kfb-1720) letter to friends (05.05.1890) (example 3) 


The original Latin text appears as follows: Misereatur vestri omnipotens deus, et 
dimissis peccatis vestris, perducat vos ad vitam aeternam. — Amen (Graeser 1829: 
342). Although Georg W. writes his signature in Latin script, he uses blackletter 
for the Latin closure, such as he used in the rest of his letter. 

The case is different in Maria R.’s letter to her siblings, where she adds the 
last words of Jesus on the cross, transmitted in Aramaic in the gospels of Mark 
and Matthew’, in the left margin of her letter, using Latin script (example 4). She 
introduces this formula with the German Bald heißt es (‘Soon, one can say’) in 
blackletter. 


9 See Mt 27:46: „um die neunte Stunde aber schrie Jesus mit lauter Stimme auf und sagte: Eli, 
Eli, lema sabachthani? Das heißt: Mein Gott, mein Gott, warum hast du mich verlassen?“ (ELB) 
— ‘And about the ninth hour Jesus cried with a loud voice, saying, Eli, Eli, lama sabachthani? 
that is to say, My God, my God, why hast thou forsaken me?’. 
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(4) Maria R. (no job specified) (kfb-994), letter to her siblings (28.03.1900): 
Bald heißt es Eli, Eli Lama Sabaktani 
eau sum nattum est 


Fig. 3: Page 1 of Maria R.'s (kfb-994) letter to siblings (28.03.1900) (example 4); Aramaic and 
pseudo-Latin formula in the left margin 
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Fig. 4: Detail from figure 3 - Aramaic and pseudo-Latin formula in the left margin 


Quoting Christ's final words, Maria R. draws a parallel between the last hours of 
Christ and her own poor health. She dies in the psychiatric hospital ten months 
after writing this letter. Using a religious formula to equate a biblical element 
with the patient's own situation is a practice that we commonly find in patient 
letters, particularly related to the passion of Christ. The words following (line 2) 
are also written in Latin script, but do not seem to be connected to the rest of the 
formula. We can recognise some elements of the Latin language (sum - ‘I am’; 
nattum ‘born (neuter)’; est ‘he/she/it is’) that Maria R. transcribes phonetically 
from memory (nattum usually has only one t), probably without knowing their 
precise meaning. These words do not seem to make much sense and Maria R. pre- 
sumably inserts some text in the religious language of Latin to place emphasis on 
her previous formula. 

While the examples cited so far provide explicit indications for the status as 
a (religious) formula, in the majority of cases in our corpus the use of the religious 
formula remains implicit. Such an example is the following: 


(5  Day-labourer Martin B. (kfb-1621), letter to his wife (10.03.1896): 
der Geist hat h aine Unsihbare Graft man mus es aus Füren mit getangen Worten und 
Wergen kain Mensch kans niht Ferstehen und weist niht wi noh was dan Baist di Maus 
Eher kainen Zwirn niht ab bis for bai ist 


In a letter to his wife, day-labourer Martin B. includes a passage of the Confiteor, 
recited by Christians as part of the penitential act. In its German version, the Con- 
fiteor contains the trinomial in Gedanken, Worten und Werken ('in my thoughts 
and in my words, in what I have done [and in what I have failed to do]’). As part 
of the prayer, this formulation is extremely fixed and clearly an instance of for- 
mulaic language. In the letter, itis completely integrated into its syntactic context 
and is not marked in any way as a formula. This also holds for the other fixed 
phrases in the same passage: weist niht wi noh was (‘he neither knows how nor 
what’) and dan Baist di Maus Eher kainen Zwirn niht ab (literally: ‘then, the mouse 
rather does not bite off no thread"). It is obvious that such unmarked uses of reli- 
gious formulae are rather difficult to identify in texts. At the same time, the fact 
that the writers often see no need to explicitly mark religious formulae can be 
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regarded as an indication of their status as highly familiar and commonly used 
units. 


4.2 Variation and Modification 


As already mentioned in section 3, formulaic language in the CoPaDocs letters 
reveals a high degree of variation. Example (5) illustrates peculiarities in several 
linguistic areas that can be explained as diastratic, diatopic and individual vari- 
ation. In the (non-religious) formula dan Baist di Maus Eher kainen Zwirn niht ab, 
for example, we find a double negation in kainen ... niht, which is a syntactic 
structure that was highly stigmatised in the 19th century and only used in the 
written language of less-schooled writers (Elspaf$ 2005: 276). This construction is 
also diatopically marked, as it appears predominantly in southern German (Els- 
paß 2005: 281). In the same formula, we see the lexeme Zwirn (‘twine’), while the 
formula usually has the near-synonym Faden (‘thread’) (Duden 2013: 498). Indi- 
vidual variation (cf. Barz 1995) also appears at the orthographic level, where we 
find individual characteristics for this writer, for example the phonetic spelling 
«ai» for /ai/ (Baist, kainen) instead of the conventional «ei» (see also examples 6a 
and 6b). 

Individual variation can particularly be observed when writers repeatedly 
use formulae. As Martin B.’s letters all have the same intention - being released 
from the hospital - their contents overlap and some of the (religious) formulae 
appear more than once: 


(6) Day-labourer Martin B. (kfb-1621), letters to his wife ([a] 28.01.1896 and [b] 10.03.1896) 
and letter to the mayor ([c] 07.05.1905): 
[a] der Gaist Gottes ist ain langer warter aber ain sehr schtringer Bestrafer 
[b] Gott ist ain Langer Warter aber ain ser Strenger Bestrafer 
[c] Gott ist Ein Langer warter aber ein Strenger Bestrafer 


In three different letters, Martin B. uses a religious formula that is derived from 
Psalm 7:12 Gott ist ein gerechter Richter und ein Gott, der tdglich strafen kann. 
(ELB) (‘God judgeth the righteous, and God is angry with the wicked |...]’). All 
three instances have a common general inventory of lexical material, but at the 


10 The <ai> spelling for the former /ei/ diphthong was a typical Upper German spelling in the 
16th and 17th centuries (Schmidt 2013: 370). Afterwards, however, Upper German texts followed 
those from the rest of the German speaking area and switched to <ei> (name spellings remained 
<ai>). Thus, we can classify the <ai> spelling in 19th- and 20th-century letters as individual rather 
than diatopic variation. 
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same time show variation. Lexical variation can for example be observed in the 
particle sehr (‘very’) that intensifies the modifying adjective streng (‘severe’), but 
only appears in the two letters to his wife (a and b) and not to the mayor (c). Sim- 
ilarly, the spelling variant «ai» instead of the conventional «ei» in ain and Gaist 
appears in all the five possible positions in the letters to his wife, but in neither 
of the two ein to the mayor. These differences between the first two formulae may 
appear by chance or because of the gap of almost ten years between Martin B.'s 
letters to his wife and that to the mayor. It may also result from the different ad- 
dressees. Martin B. in general shows less conventional spellings in his private 
letters (cf. Schiegg 2018), and - as we will see in section 5.1 — the formulae to his 
wife have a more intense and threatening character, which is indicated by the 
intensifier sehr. Therefore this kind of variation may also be explained by his ap- 
plication of different linguistic registers. The phonetic spelling schtringer in (6a) 
supports that hypothesis. It appears only in one of the letters to his wife, and 
shows the palatalisation of /s/ in initial position before a consonant that is usu- 
ally spelt «s» and not «sch», as well as an assimilation of /e/ to /i/ resulting from 
the following velar consonant /n/, which Martin B. writes as an <i>. If we assume 
that Martin B.'s variation here is intentional and has a communicative purpose, 
we approach the interface between variation and modification of formulaic lan- 
guage. 

In phraseological research, modifications are usually defined as occasional, 
intentional and context-bound variations of formulaic items (Pfeiffer 2018: 51). 
The tendency to creatively modify formulaic language is often regarded as a char- 
acteristic development of media texts in the second half of the 20th century (e.g. 
von Polenz 1999: 381). Our corpus of letters by inexperienced writers at the turn 
of the 19th century, however, shows that this practice can also be found earlier 
and in other text types. A variety of religious formulae are indeed modified for 
their specific context. This will be illustrated in examples (7) to (9). 

At the end of the following text excerpt (example 7), the tailor Pius G. uses 
the canonical form of the religious formula ans Kreuz mit ihm (‘crucify him’), rec- 
orded in all four gospels (Mt 27:22-23; Mk 15:13-14; Lk 23:21; John 19:6, EU). These 
words are shouted by the crowds to demand that Jesus be crucified after his trial 
in the praetorium before Pontius Pilate. What is interesting here is the context of 
the formula, namely the continuous equation of the situation of the writer with 
the passion of Jesus Christ. A couple of lines before, the writer reports his own 
trial before a Bavarian court of justice. In his view, the trial was nothing but a 
show trial, ending with his condemnation and a sentence of acquittal for two fel- 
low defendants. Already from a content point of view, the parallels to the passion 
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of Jesus are evident. However, the parallelisation is also realised and even inten- 
sified by linguistic means, namely by modifying the formulaic item ans Kreuz mit 
ihm to ins Narrenhaus Haus mit Ihm. The writer substitutes the constituent Kreuz 
with Narrenhaus (‘madhouse’), thus not only comparing his situation with the 
passion of Jesus but also establishing a link between the concepts of cross and 
‘madhouse’. This conceptual identification is quite frequent in the letters investi- 
gated (see section 5.1.2). 


(7) Tailor Pius G. (kfb-936), letter to the local government (03.02.1890): 

Wirkl. Würklich 756 sich der sogenannte Gerichtshof Zurük — und brachte die unzurech- 
nungsfaghigkeit fertig Gegen einen Bürger u. Soldat von unbescholtenem Ruf Die zwei 
Meuhelmórder wurden Freihgesprochen, Und das Wittelsbacher Haus sammt seiner 
Schand Regierung war Gerettet. Jetzt ins Narrenhaus Haus mit Ihm! Dieses waren 
die ersten Worte welche ich hótrte, als ich heraus kam, ich aber ging ins Hofbrduhaus! 
Wo mir unwillkührlich die Geschichte eines Iesus einfiel dem auch sein Haupt Verbre- 
chen darin bestand, weil Er den Banditten von Gottes Gnade, die Wahrheit sagte. Ans 
Kreuz mit Ihm! An dem Er noch heute hängt, zum Schreken für diejenigen welche es 
je wieder wagen sollten, Gegen derartige Verbrecher, und gedungene Knechte zu Zeu- 
gen! 


The following two passages show further examples of modified religious formu- 
lae. 


(8) Day-labourer Martin B. (kfb-1621), letter to his wife (10.03.1896): 
du soltest fre froh sein wan ich zu dir nohmal gehen wirte dan wirte es Dir nohmal 
beser Se" aber ich wil dihr niht mehr über lestig sein Du wirtest mih sofort wider hiher 
bringen ich hab in Krumbah auh witer kute Fraind di sich um meiner Annemmen ten 
ich kan noh Arbeiten Der Mensch tengt und der Bóse Geist lengt oft mer als der 
gute Geist 


(9) Day-labourer Martin B. (kfb-1621), letter to the mayor of his home town (06.01.1901): 
Mein Weib und meine Tochter haben mich von der Anstald Kaufbeiren wider abgeholt! 
Es hat mich sehr gefreüt daß ich so gut wider entlaßen worden bin; aber meine 
Freüde welche ich gehabt habe, ist wieder in Trauer verwandelt worden; 


In example (8), Martin B. refers to the formula der Mensch denkt, Gott lenkt (‘man 
thinks, God directs’), which is not a direct quotation from the scriptures but can 
be traced back to the book of Proverbs 16:9: Das Herz des Menschen plant seinen 
Weg, aber der HERR lenkt seinen Schritt. (ELB) (‘A man's heart deviseth his way: 
but the LORD directeth his steps.’) and the derived and shortened middle-Latin 
phrase Homo proponit, sed deus disponit (Duden 2017: 382). Again, the writer does 
not use the formula in any canonical form, but rather adapts it to the specific 
context. In his letters, Martin B. often claims that his wife is possessed by an evil 
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spirit, who has pushed aside her once good character. In his view, it is this evil 
spirit who prevents her from getting him out of the hospital. To express this view, 
he modifies the formula in a twofold manner: firstly, he substitutes the compo- 
nent Gott (‘God’) with der Bóse Geist (‘the evil spirit’). Secondly, he expands the 
formula with the comparative structure oft mer als der gute Geist (‘often more 
than the good spirit’). Examples like this show that even less-experienced writers 
produce quite complex modifications to support their communicative goals. 

The modification in (9) also adapts a biblical quotation to the context of the 
letter. The highlighted passage refers to John 16:20: ihr werdet traurig sein, aber 
eure Trauer wird sich in Freude verwandeln (EU) (‘and ye shall be sorrowful, but 
your sorrow shall be turned into joy’). By substituting the possessive article (eure 
vs. meine), inserting the relative clause (welche ich gehabt habe) and permutating 
the constituents joy and sorrow, the formula is adapted to the writer’s situation: 
shortly before, he had been released from the psychiatric hospital — only to find 
himself rehospitalised after a couple of days. So his joy about being released has 
indeed turned into sorrow again - the biblical promise of salvation has not come 
true but has rather been turned upside down for him. 


5 Functions of Religious Formulae in Patient 
Letters 


5.1 Argumentative Functions 


From a text pragmatic perspective, the patients’ letters are polyfunctional, serv- 
ing appellative, contact-orientated and also informative functions. The dominant 
function of most letters is directive-appellative: the patients request the address- 
ees to do something for them - often their release from the hospital, but also a 
visit or a parcel from home. This request is usually supported by the use of argu- 
mentative textualisation patterns, shaping major text parts and even whole let- 
ters. From the perspective of this paper, it is interesting to investigate which role 
religious formulae play in these contexts. 

The typical argumentation in patient letters is deontic, with the thesis/con- 
clusion being I [the writer] should be released from hospital and/or you [the ad- 
dressee] should help me be released from hospital. To support this, the writers of- 
ten claim that they were mistakenly hospitalised in the first place since they are 
not actually ill. The addressees are regarded as responsible for the writer's hospi- 
talisation, so, from the writer's perspective, not only are they capable of effecting 
their release, but it is also their moral responsibility to do so. A possible strategy 
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here is to WARN and THREATEN the addressees of what will happen if they do 
not fulfil this moral obligation. These acts are often realised or supported by the 
use of religious formulae. The day-labourer Martin B., for instance, uses this pat- 
tern in very similar ways in five of his letters: 


(10) Day-labourer Martin B. (kfb-1621), letters to his wife ([a] 11.01.1895, [b] 28.01.1896, [c] 
08.06.? and [d] 17.07.?) and to the mayor of his home town ([e] 05.07.1905): 
[a] dengst du nih an das Sterben so wirst den Himmel mimals Erben u denge an das 
Gottes Keriht was unser Her Gott mit dihr dann sbirht 
[b] tenge ainmal nah Das du Sterben must und Rehenschaft ab Legen must for Gottes 
Geriht 
[c] dise heren solen auh Tengen an den Tot den unser Her Gott ist ain Stringer Rihter 
[d] dengst du niht an das Sterben so wirst du auh den Himmel niht Erben d und denge 
an das lótzte Gehriht was unser Her Got mit dir dan Sbriht 
[e] Dengen sie Daf$ sie auch einmal sterben müssen und Gottes Gericht rehen schaft 
ablógen müssen 


The cited passages contain another interesting type of formulaic variation. While 
the concrete lexical realisation of the pattern varies, the individual units of argu- 
mentation and their sequence are remarkably constant. In a first step, the writer 
reminds the addressee of his or her own death, actualizing the famous vanity 
topos of memento mori. This topos can be traced back to Psalm 90:12: Lehre uns 
bedenken, daß wir sterben müssen (LU, ‘So teach us to number our days’). In all 
five cases, the memento mori is followed by an explicit warning of the Last Judge- 
ment, where the addressees will have to account for their misconduct, especially 
for not supporting the writer's release. 


5.1.1 Topical Argumentation 


Referring to God and other religious authorities is an argumentative strategy the 
writers repeatedly use to prove and force their point. In these contexts, religious 
formulae are of particular relevance due to their status as endoxa, whose truth or 
validity is widely taken for granted as an effect of their source. Such source-based 
authority provides a specific potential for argumentation. In conventional argu- 
mentative contexts, speakers are pragmatically committed to proving the truth or 
validity of their thesis statement. Referring to the authority of a certain person or 
source, however, can obviate the need for actual argumentation. In these cases, 
we come across instances of topical argumentation, the general mechanisms of 
which were already described by Aristotle (cf. Aristoteles, Topik, 116a). Using the 
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authority topos is probably the most typical form of topical argumentation. Ac- 
cording to Klein (2001: 1317), the underlying pattern and conversational effect of 
the authority topos can be described as follows: if a person or entity associated 
with high authority produces an utterance U or performs an action A, it is likely that 
U is true or valid / that A is appropriate. In such cases, it is superfluous to support 
one's positions in a truly argumentative way. 

Our key word search for sagen ('say'; and its morphological variants) yielded 
the following examples (11) to (14) that illustrate the usage of religious formulae 
in this form of topical argumentation:" 


(11) Plumber Friedrich Wilhelm S. (ham-20087), letter to diverse official addressees 
(29.02.1936): 
Somit erklähre ich vor Gott und der ganzen Menschheit auf Erden alle Fahneneide und 
Treueeide für Ungültig da Gott seinen Heiligen Namen führ solche Sauereien nicht her- 
gibt. Jesu sagte Deutlich wehr mit dem Schwert tótet wird durchs Schwert Umkom- 
men Gott brauch keine Soldaten. 


(12) Mill labourer Georg Sch. (kfb-1763), letter to Bavarian bishops (undated): 

Da der Papst die Sache nicht versteht, hatte Er gar nicht verlangen sollen, daß man die 
Unfehlbarkeit glauben muß, ohne Belehrung u. Beweiß. Dazu erinnere ich an die 
Apostel. Jesus war, u. ist, eine hóhere Person als der Papst, u. die Apostel haben nicht 
geglaubt, u. der Ap. Tomas sagte, wenn ich nicht meine Hand in seine Wunden 
lege, glaube ich nicht. Demnach wird es erlaubt sein, zu fragen, ob das recht ist, 
wenn man dieße welche auch nicht glauben ohne Beweiß, von der Kirche ausschlüßt. 
Ich erinnere auch, daß Jesus befohlen hat, das Unkraut nicht ausrotten. 


In the context of the first passage, plumber Friedrich Wilhelm S. sharply criticises 
the militarisation of Germany in the 1930s and the common practice of religiously 
motivating oaths of loyalty and allegiance. Instead, he pleas for demilitarisation 
and a pacifist attitude from the church and society. To support this, he refers to a 
famous quote from Jesus, uttered when one of his disciples violently tried to pre- 
vent his arrest after the betrayal of Judas Iscariot: wer das Schwert nimmt, der soll 
durchs Schwert umkommen (Mt 26:52, LU, ‘all they that take the sword shall perish 
with the sword’). By quoting Jesus and the scriptures, the writer has recourse to 
the authority of the source. At least for fellow Christians, it is not easy to take the 
opposite view. 


11 The effects of the usage of religious formulae in topical argumentation have also been de- 
scribed for other text types, for example emigrant letters (see section 2.2). 
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This function can also be found in example (12). In a letter addressed to all 
Bavarian bishops, mill labourer Georg Sch. basically argues that he cannot be in- 
sane since he always acts according to Jesus’s model. So if the writer were insane, 
Jesus would necessarily have been insane, too. The letter contains theoretical re- 
flections on a number of theological issues, particularly on confessional schism. 
In the passage quoted, he requests the bishops not to excommunicate religious 
sceptics since Jesus had not expelled sceptics among his disciples either. The re- 
quest is substantiated with reference to two biblical formulae: the well-known 
quote from the apostle Thomas (“doubting Thomas”) wenn ich meinen Finger 
nicht in das Mal der Nägel und meine Hand nicht in seine Seite lege, glaube ich nicht 
(John 20:25, EU, ‘except I shall [...] put my finger into the print of the nails, and 
thrust my hand into his side, I will not believe’) and Jesus's verdict that the tares 
should not be rooted out to protect the wheat in the parable of the Wheat and 
Tares (Mt 13: 24—30). 

A specific variation of the authority topos is characteristic for writers suffer- 
ing from religious delusions. These patients often claim that they act in a partic- 
ular way because God - or other persons or forces in his name - has explicitly 
told them to do so. The action is thus justified as God's will. In these contexts, 
too, religious formulae play an important role since the religious authority often 
uses words from the scriptures. This may be illustrated by the following exam- 
ples: 


(13) Farmer's daughter Magdalena S. (kfb-450), letter to friends (25.12.1857): 
bei mir gillt das Wort auch, wie bei Abraham, als der allmachtige Gott sagte; Gehe aus 
deinem Vaterlande, und aus deiner Freundschaft, und aus deines Vaters-Haus, 
in ein Land das ich dir zeuge; denn daselbst will ich dich zum großen Volk ma- 
chen. Gerade diese Worte hat Gott zu mir gesagt, ehe ich nach Neuendettesaus wan- 
derte [...]. 


(14) Shop owner's daughter Anna K. (kfb-2585), diary (27.04.1852): 

Mir war während des Schlafes als sei ich im Hause meiner Eltern und mein Bruder 
Edmund brachte mir zwei Briefe in denen ich Nachricht bekam: ,,Es sei unserm Kónige 
Maximilian während der Nacht u. des Gebetes ein Engel erschienen der ihm gesagt, 
dieses /:nämlich ich:/ ist die Maria die Jungfrau, die den Messias gebären soll“! 
Als ich erwacht war sagte das Wesen in meinem Jnnern,: fliehe! Sie werden Dich und 
mich zu tödten suchen! Nimm Dein Kind und gehe zu unserm Könige! - Und dann 
ist mirs als müße ich mit Enzler fort! 


In both passages, the use of the religious formula not only motivates and justifies 
a particular action of the patient’s, but also establishes a parallel between the 
writers and the biblical persons who were the original addressees of the respec- 
tive words (Abraham and Joseph) (see section 5.1.2). 
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In example (13), it is God himself who speaks to Magdalena S. with the same 
words he once used to call Abram (Und der HERR sprach zu Abram: Gehe aus dei- 
nem Vaterlande und von deiner Freundschaft und aus deines Vaters Hause in ein 
Land, das ich dir zeigen will Und ich will dich zum grofsen Volk machen; 1. Mose 
12:1-2, LU - ‘Now the LORD had said unto Abram, Get thee out of thy country, 
and from thy kindred, and from thy father's house, unto a land that I will shew 
thee: And I will make of thee a great nation’). 

While the wording in (13) widely conforms to the biblical source, the formu- 
lae in (14) exhibit more adaptions to the context. In her diary, written in 1852 in 
the hospital in Augsburg before she had been transferred to the psychiatric hos- 
pital in Irsee, Anna K. refers to Archangel Gabriel's Annunciation, whereby she 
herself is being addressed as Virgin Mary, who will give birth to Jesus (cf. Lk 1:31). 
However, the angel does not appear to the writer herself but to the then-king Max- 
imilian II. of Bavaria, who forwards the angel's prophecy. The religious authority 
of the original source is enriched with the earthly authority of the king as the ini- 
tial bearer and receiver of the message - an interesting “double” use of the au- 
thority topos. In a further instance of topical argumentation, Anna K. then recites 
a voice from inside her, adopting the words of Mt 2:13: Steh auf, nimm das Kind 
und seine Mutter und flieh nach Agypten; [...] denn Herodes wird das Kind suchen, 
um es zu tóten (EU, ‘Arise, and take the young child and his mother, and flee into 
Egypt, [...] for Herod will seek the young child to destroy him’). These are the 
words of an angel of the Lord who appears to Joseph in a dream after Jesus's birth, 
and Anna K. interprets this as an instruction to leave with Dr Enzler, one of the 
hospital's doctors. 


5.1.2 Affirmation by Parallelisation 


As we have seen, the patient texts contain a large number of cases in which the 
writer parallelises his or her own situation with a religious event, person or topos. 
Our key word search for Kreuz (‘cross’) in CoPaDocs yielded several examples in 
which the writer's situation (being in the hospital) is metaphorically conceptual- 
ised as the cross he or she has to bear.” The farmer's daughter Marie V., for ex- 
ample, wrote the following letter (cited in full) from a psychiatric hospital in 
Lower Bavaria: 


12 We have already discussed two of them: Maria R. (example 4) draws a parallel between the 
last hours of Christ and her own poor health by including Christ's last words in Aramaic in her 
letter; Pius G. (example 7) connects the passion of Christ and the cross with the asylum. 
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(15) Farmer’s daughter Marie V. (mkf-987), letter to the institution (1932/33): 
Sehr geerhter urnd liebefevolste Direktion Arzte Schwester Ihr seid mir die Stufe im 
Kreuzweg N 6 ich die Verronika Ihr der liebe Heiland tief trügt Ihr das Bildnif5 der Liebe 
in mein Herz aber ich muß wandeln die 2te Jesu nimt das Kreuz auf seine Schuldern 
Ich bitte Euch von Herzen verzeihet mir der liebe Got sol für Euerer Liebe znu uns alles 
vergelten Ein iniges Vergelts Gott 


In this letter, she thanks the institution's doctors and nurses and compares her- 
self with Veronica, who, according to Christian legends, offered Jesus a piece of 
cloth to wipe blood and sweat from his face at the sixth station along his Via Do- 
lorosa. The veil afterwards bore an imprint of Jesus's face that this letter interprets 
as a portrait of love that had been offered by Jesus (i.e. the institution's doctors 
and nurses) to Veronica (i.e. Marie V.). The perspective then changes and the 
writer calls herself “the second Jesus” who has to bear the cross (i.e. her illness) 
on her shoulders. These images have parallels with the Bible (cf. Mk 15:20, Mt 
27:32, Lk 22:26, Joh 19:16), but the Stations of the Cross do not appear in detail in 
it. They are derived from late medieval developments in Christian liturgy, when 
the story of the Via Dolorosa became a central element of meditation (Köpf 1996: 
728). In several religious publications, we find the story of Veronica and her veil, 
as well as the formula Jesus nimmt das Kreuz auf seine Schultern (‘Jesus carries 
the cross on his shoulders.’), for example in a 19th-century meditation book on 
the Via Dolorosa for use in church in Passau, Lower Bavaria (Schmid 1855: 6). 
Marie V.'s use of Christian imagery and religious language allows her to draw 
parallels both to her state of health and to the doctors' and nurses' benignity. 


5.1.3 Expression of a Shared Ethos 


It has often been noticed that formulaic language plays an important role in ad- 
dressing one's audience (cf. e.g. Fleischer 1997: 218). In argumentative contexts, 
for instance, writers often resort to formulaic language to express shared 
knowledge, assessments and values (Pfeiffer 2016: 227—234). In the CoPaDocs let- 
ters, too, writers allude to religious formulae to establish a connection and create 
a social community with their addressees. While Martin B. predominantly uses 
religious formulae to warn and threaten his addressees, mainly his wife — see the 
five letters cited in example (10) - his formulae are not restricted to this function. 
In the few religious formulae” to the other addressees, the mayor of his home 


13 There are only four religious formulae in letters addressed to people other than Martin B.'s 
wife, three in letters to the mayor and one in a letter to his aunt. In the letters to his wife, we have 
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town, his aunt and his brother-in-law, we can observe other functions. Example 
(9) has already demonstrated the use of religious formulae in a letter to the mayor 
in which he illustrated his sorrow at being brought back to the hospital, with a 
modified formula putting emphasis on his general argument that he should be 
released. He also includes a religious formula in a letter to his aunt: 


(16) Day-labourer Martin B. (kfb-1621), letter to his aunt (15.05.1899): 
du schreibst mir ich solle auf Gott vertrauen er sei der Helfer in der Noth ich werde 
dann immer beßer den Himmel verdinen wann es auf dieser Welt keine Barmherzig- 
keit giebt für "^ dann ist mein langes warten vergebens die Barmherzigkeit muß ich 
von Menschen Hülfe erhalten 


In this passage, Martin B. refers to a letter that he received from his aunt (not part 
of his file), in which she tried to calm him down by asking him to trust in God as 
an aide in any misery (cf. Psalm 124). He points to this shared knowledge but em- 
phasises the importance of mercifulness on earth and thus the aunt's duty to help 
him. Although his text here is still appellative, the writer does not try to reach his 
goals by means of a threat, but by referring to a common knowledge and express- 
ing a shared ethos with the recipient, a practice that has been observed in other 
contexts (Dossena 2013; see section 2.2). The threats only appear in the letters to 
his wife and his last letter to the mayor, when he may have realised that the po- 
liteness of his earlier letters would not lead to the goals envisioned. 


5.2 Text-Structural Functions 


While our analysis has up to this point focussed on functions of religious formu- 
lae with regard to the content of the letters, we can also find formulae that pri- 
marily help to structure letters (cf. Stein 2003: 123-132). As we have seen above, 
Magdalena S. (example 1) starts her letter with a religious formula, while Georg 
W. (example 3) closes his letter with a religious formula plus Amen. To find out 
whether this practice of closing letters with Amen also appears in other patient 
letters, we searched the corpus for Amen. In several cases, Amen is convention- 
ally used to close prayers that are recited in the patients' texts. Some are similar 
to Georg W.'s example, where a text ends with a religious formula that closes with 


observed 32 religious formulae. The number of tokens addressed to his wife is higher (about 
5,100 words) than to the other addressees (together about 2,750 words), but the religious formu- 
lae are still clearly dominant in the letter to his wife. Religious lexis in general is significantly 
more frequent in the letters to his wife than to the other addressees. 
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Amen. Often, writers add references to a shared religious context (see section 
5.1.3) to the end of their letters and close this section with Amen. This is the case 
in example (17), where Josefa M. asks her brother Martin to pray for her or have a 
mass read for her and closes this section with Amen. Afterwards, she adds two 
more conventional closure formulae (‘I now close my letter...’ and ‘You are 
greeted cordially’; cf. Elspaß 2005: 163), her informal nickname Babett, and fi- 
nally a short addition that Martin's saint's day is approaching (11th November), 
with which she again tries to establish a connection with him: 


(17) Seamstress Josefa M. (kfb-2211), letter to her brother (09.11.1913): 
[...] bete auch für mich oder laf$ für eine schwer kranke eine hl. Messe lesen. Amen. 
Ich beschließe nun mein Schreiben u. hoffe daß Du mir nicht zürnst. Du seiest also 
aufs herzlichste von mir gegrüfst u. die Deinigen. Babett kommt jetzt auch bald der 
Namenstag. 


The corpus search also produced cases, where the formula Amen is used in text 
structuring functions, but outside religious contexts. After closing and signing 
his letter, Anton H. (example 18) adds another few sentences in which he repeats 
his plea to be collected from hospital. He emphasises that he has turned into a 
different person, healthier and able to work and that he behaves decently. He 
closes this confirmation that is unconnected to any religious domain with Amen. 
This short formula thus helps to fulfill the communicative task of closing a letter: 


(18) Tailor Anton H. (kfb-1789), letter to his family (08.06.1900): 
Hollet mich balde ihr werdet sehen Ich bin auch ein andere Mensch geworden, gesun- 
der und rechter ihn der Arbeit, Arbeiten thue Ich auch ihmer den Tagen, ganz mache 
meine Sache ihmerund recht. Amen. 


Using Amen at the end of their text provided inexperienced writers with an option 
for marking the end of their letters with an affirmative formula. This secularised 
use of Amen in a text-structural function does not seem to have been described 
for German lower-class writing before. We find very similar uses of Amen as clo- 
sure formula in the Dutch Letters as Loot Corpus (Rutten and Van der Wal 2014: 
106, 182). To get a first impression whether this function can be found across in- 
dividual corpora, we also searched for uses of Amen in the corpus of 19th-century 
emigrant letters from Germany." Interestingly, we indeed found a number of sim- 
ilar uses in these letters, too. For example, farmers from Norwood in Cincinnati 
ended a letter to their family with greetings to all relatives, followed by Amen, 
place and date of writing and their signatures: 


14 We thank Stephan Elspaß for his permission to use and quote from this unpublished corpus. 
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(19) Farmers Dedert and Johanna Farwick, letter to family (March 1851): 
Wir grüßen Euch alle, Vater, Mutter, Bruders, Schwesters und Schwagers und alle un- 
sere Verwandten. Amen. Cincinnati Marz 1851 Dedert und Jan Farwick 


6 Conclusion and Outlook 


Our paper has illustrated the spectrum of occurrences and functions of religious 
formulae in the CoPaDocs corpus, comprising letters from patients of psychiatric 
hospitals born in the 19th century. Since most of the letters were written by lower- 
class people with a low level of education, the letters permitted an insight into 
the use of formulaic language by ordinary people in the 19th century. 

Some of our findings can be regarded as specific to the letters investigated 
here. This holds in particular for passages where patients suffering from religious 
delusions claim that God or other religious instances have told them to do some- 
thing using religious formulae known from other contexts. Another interesting 
observation was that lower-class writers in the 19th century already exploited the 
potential of formulaic language for occasional modifications, often in a remarka- 
bly complex and creative manner. This shows that the tendency for creative use 
of formulaic items has a long tradition and is not a development of recent dec- 
ades. Interestingly, the modifications we found do not seem to aim at wordplay 
but are most obviously produced to achieve particular communicative goals. 

The majority of our findings, however, rather confirm observations that have 
already been made for 19th century lower class letter writing in previous studies. 
Prominent features such as a high degree of variation in formulaic language use, 
and a tendency to refer to religious formulae to express shared values and to uti- 
lise the advantages of topical argumentation are not specific to the corpus inves- 
tigated, but seem to be characteristic of 19th century lower-class letter writing in 
general. This also seems to apply to the secularised use of Amen in a text-struc- 
tural function. 

This volume is dedicated to Elisabeth Piirainen, who in a number of papers 
and books has researched on widespread idioms in Europe and beyond (e.g. Pii- 
rainen 2012, 2015). Continuing this approach to formulaic language, contrastive 
studies - not only across the borders of single letter collections but also across 
different languages - might offer a variety of new insights, especially with regard 
to a *pan-European tradition of letter writing" (Rutten and Van der Wal 2013: 52) 
and the question of “whether there existed something like a central European 
stock of letter writing formulae and how they could have evolved or how they 
were transmitted into the different languages" (Elspaß 2012: 60). Our findings on 
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the use of Amen could be related to this idea. However, the various writers’ indi- 
vidual attempts at closing a letter by means of this formulae may as well have 
resulted from their shared European cultural and religious backgrounds. 

In any case, religious formulae in historical lower-class writing have been 
shown to be a promising area for historical research on formulaic language. The 
corpus of historical patient documents provides new data that can shed light on 
the occurrence and functions of religious formulae in these letters, giving us val- 
uable insights both into the writers' religious knowledge and into their compe- 
tence in transforming this knowledge into a letter. 
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